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CLASSIFICATION AND PROGNOSIS PREDICTION OF ACUTE 
LYMPHOBLASTIC LEUKEMIA BY GENE EXPRESSION PROFILING 



FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT 
This research underlying this invention was supported in part with funds from 
National Institutes of Health grants P01 CA71907-06, CA51001, CA36401, 
CA78224, Cancer Center CORE Grant CA-21765, and National Science Foundation 
10 grant EIA-0074869. The United States Government may have an interest in the 
subject matter of the invention. 

BACKGROUND OF THE INVENTION 
Pediatric acute lymphoblastic leukemia (ALL) is one of the great success 
15 stories of modern cancer therapy, with contemporary treatment protocols achieving 
overall long-term event free survival rates approaching 80% (Schrappe et al. (2000) 

o-.i ..-*_«.i..<noo.iX»/oo/7 97-1 21T-.1 8:_and Pui and Evans 

HlOOCL yD:JJlU-ZZ; cmivciiiuuu ct t**.v-""*./ ~ . 

(1998) N. Eng. J. Med. 339:605-15). This success has been achieved in part by using 
risk-adapted therapy that involves tailoring the intensity of treatment to each patient's 
20 risk of relapse. This approach was developed following the realization that pediatric 
ALL is a heterogeneous disease consisting of various leukemia subtypes that differ 
markedly in their response to chemotherapy (reviewed in Pui and Evans (1998) N. 
Eng. J. Med. 339:605-15). By tailoring the intensity of treatment to apatient's 
relative risk of relapse, patients are neither under-treated or over-treated, and are thus 
25 afforded the highest chance for a cure. 

Critical to the success of this approach has been the accurate assignment of 
individual patients to specific risk groups. Although risk assignment is influenced by 
a variety of clinical and laboratory parameters, the genetic alterations that underlie the 
pathogenesis of individual leukemia subtypes figure prominently in most 
30 classification schemes (Silverman LB et al (2001) Blood 97:121 1-18; and Pui and 
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Evans (1998) N. Engl. J. Med. 339:605-15). Through systematic immunophenotyping 
and cytogenetic analysis, and the subsequent molecular cloning of the genes targeted 
by the identified chromosomal rearrangements, a number of genetically distinct 
leukemia subtypes have been defined. These include B-lineage leukemias that 
5 contain t(9;22)[BCR-ABL], t(l;19)[E2A-PBXl], t(12;21)[TEL-AMLl], 

rearrangements in the MIX gene on chromosome 11, band q23, or a hyperdiploid 
karyotype (i.e., >50 chromosomes), and T-lineage leukemias (T-ALL) (Silverman et 
fl /.(2001) Blood 97:121 1-18; and Pui and Evans (1998) N. Eng. J. Med. 339:605-15). 
The underlying genetic lesions in these leukemia subtypes influence the response to 
10 cytotoxic drugs. For example, leukemias that express the E2A-PBX1 fusion protein 
respond poorly to conventional antimetabolite-based treatment, but have cure rates 
approaching 80% when treated with more intensive therapies (Raimondi et al. (1990) 
J. Clin. Oncol. 8:1380-88; and Hunger (1996) Blood 87:121 1-1224). Similarly, BCR- 
ABL expressing ALLs, or infants with MLL rearrangements have exceedingly poor 
15 cure rates with conventional chemotherapy, and allogeneic hematopoietic stem cell 
transplantation with HLA matched sibling donor has already been shown to improve 
outcome for patients with the former leukemia subtype (Pui et al. (1991) Blood 
77:440-46; Heerema et al (1999) Leukemia 13:679-86; Arico et al (2000) N. Engl. J. 
Med. 342:998-1006; andBiondi et al. (2000) Blood 96:24-33). 
20 Unfortunately, the accurate assignment of patients to specific risk groups is a 

difficult and expensive process, requiring intensive laboratory studies including 
immunophenotyping, cytogenetics, and molecular- diagnostics (Pui and Evans (1998; 
N. Eng. J. Med. 339:605-15; and Pui et al. (2001) Lancet Oncology 2:597-607). 
Moreover, these diagnostic approaches require the collective expertise of a number of 
25 professionals, and although this expertise is available at most major medical centers, it 
is generally unavailable in developing countries. Accordingly, there remains a need 
for rapid, less expensive methods of assigning patients affected by ALL into known 
leukemia risk groups and identifying patients for whom there is a high risk that 
conventional therapeutic approaches will fail. 



30 



BRIEF SUMMARY OF THE INVENTION 
The present invention provides methods and compositions useful for 
diagnosing and choosing treatment for subjects affected by leukemia. The claimed 
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methods include methods of assigning a subject affected by leukemia to a leukemia 
risk group, methods of predicting whether a subject affected by leukemia has an 
increased risk of relapse, methods of predicting whether a subject affected by 
leukemia has air increased risk of developing secondary acute myeloid leukemia 
5 (AML) methods to aid in the determination of a prognosis for a subject affected by 
leukemia, methods of choosing a therapy for a subject affected by leukemia, and 
methods of monitoring the disease state in a subject undergoing one or more therapies 
for leukemia. Methods of screening test compounds to identify therapeutic 
compounds useful for the treatment of leukemia and molecular targets for these 
10 therapeutic compounds are also provided. 

The claimed methods comprise providing an expression profile of a sample 
from a subject affected by leukemia and comparing this subject expression profile to 
one or more reference expression profiles. In one embodiment, the reference profiles 
are associated with leukemia risk groups, and the subject expression profile is 
15 compared to one or more of these risk group reference profiles to thereby assign the 
subject affected by leukemia to a leukemia risk group. In another embodiment, one or 
more reference profiles are associated with relapse of leukemia and the subject 
expression profile is compared to one or more of these relapse reference profiles to 
determine if the subject has an increased risk of relapse. In yet another embodiment, 
90 one or more reference profiles are associated with secondary AML, and the subject 

expression profile is compared to one or more of these reference profiles to determine 
whether the subject has an increased risk of developing secondary AML. 

The present invention also provides compositions useful for diagnosing and 
choosing a therapy for subjects affected by leukemia. These compositions include 
25 arrays comprising a plurality of capture probes that can bind specifically to nucleic 
acid molecules that are differentially expressed in leukemia risk groups, in leukemia 
subjects who have relapsed, or in leukemia subjects who have developed secondary 
AML. Also provided is a computer-readable medium comprising digitally-encoded 
expression profiles comprising values representing the expression levels of genes that 
30 are differentially expressed in leukemia risk groups, in leukemia subjects who have 
relapsed, or in leukemia subjects who have developed secondary AML. Additional 
compositions of the invention include kits comprising an array of capture probes that 
can bind specifically to nucleic acid molecules that are differentially expressed in 
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leukemia risk groups, in leukemia subjects who have relapsed, or in leukemia subjects 
who have developed secondary AML, and a computer-readable medium having 
digitally encoded expression profiles with values representing the expression level of 
a nucleic acid molecule detected by the array. 

5 

DETAILED DESCRIPTION OF THE INVENTION 
The present invention provides a single platform, expression analysis, that can 
accurately identify each of the known prognostically and therapeutically relevant 
subgroups of leukemia and predict the risk of relapse and the risk of secondary 
10 (therapy-induced) AML in patients having leukemia. The methods and compositions 
of the invention provide tools useful in choosing a therapy for leukemia patients, 
including methods for assigning a leukemia patient to a leukemia risk group, methods 
of predicting whether a leukemia patient has an increased risk of relapse, methods of 
predicting whether a leukemia patient has an increased risk of developing secondary 
1 5 (therapy-induced) AML, methods of choosing a therapy for a leukemia patient, 

methods of determining the efficacy of a therapy in a leukemia patient, and methods 
of determining the prognosis for a leukemia patient. 

The methods of the invention comprise the steps of providing an expression 
profile from a sample from a subject affected by leukemia and comparing this subject 
20 expression profile to one or more reference profiles that are associated with a 

particular physiologic condition, such as a leukemia risk group, the occurrence of 
relapse, or the development of secondary AML. By identifying the leukemia risk 
group reference profile that is most similar to the subject expression profile, the 
subject can be assigned to a leukemia risk group. Similarly, the risk that a subject 
25 affected by leukemia will relapse or develop secondary AML can be predicted by 

determining whether the expression profile from the subject is sufficiently similar to a 
reference profile associated with relapse or a reference profile associated with the 
development of secondary AML. 

In another embodiment, the subject expression profile is from a subject affected by 
30 leukemia who is undergoing a therapy to treat the leukemia. The subject expression 
profile is compared to one or more reference expression profiles of the invention to 
monitor the efficacy of the therapy. 
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Expression Profiles 

As used herein, an "expression profile" comprises one or more values 
corresponding to a measurement of the relative abundance of a gene expression 
product Such values may include measurements of RNA levels or protein 
abundance. Thus, the expression profile can comprise values representing the 
measurement of the transcriptional state or the translational state of the gene. See, 
U.S. Pat. Nos. 6,040,138, 5,800,992, 6,020135, 6,344,316, and 6,033,860, which are 
hereby incorporated by reference in their entireties. 

The transcriptional state of a sample includes the identities and relative 
abundance of the RNA species, especially mRNAs present in the sample. Preferably, 
a substantial fraction of all constituent RNA species in the sample are measured, but 
at least a sufficient fraction to characterize the transcriptional state of the sample is 
measured. The transcriptional state can be conveniently determined by measuring 
transcript abundance by any of several existing gene expression technologies. 

Translational state includes the identities and relative abundance of the 
constituent protein species in the sample. As is known to those of skill in the art, the 
transcriptional state and translational state are related. 

In some embodiments, the expression profiles of the present invention are 
generated from samples from subjects affected by leukemia, including subjects having 
20 leukemia, subjects suspected of having leukemia, subjects having a propensity to 
develop leukemia, or subjects who have previously had leukemia, or subjects 
undergoing therapy for leukemia. The samples from the subject used to generate the 
expression profiles of the present invention can be derived from a variety of sources 
including, but not limited to, single cells, a collection of cells, tissue, cell culture, 
bone marrow, blood, or other bodily fluids. The tissue or cell source may include a 
tissue biopsy sample, a cell sorted population, cell culture, or a single cell. Sources 
for the sample of the present invention include cells from peripheral blood or bone 
marrow, such as blast cells from peripheral blood or bone marrow. 

In selecting a sample, the percentage of the sample that constitutes cells 
having differential gene expression in leukemia risk groups, relapse, or secondary 
AML should be considered. Samples may comprise at least 20%, at least 30%, at 
least 40%, at least 50%, at least 55%, at least 60%, at least 70%, at least 75%, at least 
80%, at least 85%, at least 90%, or at least 95% cells having differential expression in 
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leukemia risk groups, relapse, or secondary AML, with a preference for samples 
having a higher percentage of such cells. In some embodiments, these cells are blast 
cells, such as leukemic cells. The percentage of a sample that constitutes blast cells 
may be determined by methods well known in the art; see, for example, the methods 

described elsewhere herein. 

In some embodiments of the present invention, the expression profiles 
comprise values representing the expression levels of genes that are differentially 
expressed in leukemia risk groups, in subjects affected by leukemia who have 
relapsed, or in subjects affected by leukemia who have developed secondary AML. 
The term "differentially expressed" as used herein means that the measurement of a 
cellular constituent varies in two or more samples. The cellular constituent may be 
upregulated in a sample from a subject having one physiologic condition in 
comparison with a sample from a subject having a different physiologic condition, or 
down regulated in a sample from a subject having one physiologic condition in 
comparison with a sample from a subject having a different physiologic condition. 
For example, in one embodiment, the differentially expressed genes of the present 
invention may be expressed at different levels in different leukemia risk groups. In 
another embodiment, the differentially expressed genes are expressed in different 
levels in subjects affected by leukemia who will relapse after conventional treatment 
in comparison with subjects affected by leukemia who will not relapse and thus will 
remain in continuous complete remission. In yet another embodiment, the 
differentially expressed genes are expressed in different levels in subjects affected by 
leukemia who will develop secondary AML in comparison with subjects affected by 
leukemia who will not develop secondary AML. 

The present invention provides groups of genes that are differentially 
expressed in diagnostic leukemia samples of patients in different risk groups, or in 
patients that go on to develop a relapse or a therapy induced (secondary) AML. Some 
of these genes were identified based on gene expression levels for 12,600 probes in 
360 leukemia samples. Values representing the expression levels of the nucleic acid 
molecules detected by the probes were analyzed using five different statistical metrics 
to identify genes that were differentially expressed in leukemia risk groups. The 
methods used to analyze the expression level values to identify differentially 
expressed genes were the Chi-square statistics method, the Correlation-based Feature 
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Selection method, the T-statistics method, the Wilkins' method, and the self- 
organizing map and discriminant analysis with variance metric. Although different 
methods of analysis resulted in the selection of different groups of differentially 
expressed genes, the genes selected by each method could be used to create an 
5 expression profile that could accurately determine whether a leukemia patient should 
be assigned to a risk group, with an overall diagnostic accuracy of about 96%. See, 

the Experimental section. 

Additional genes that are differentially expressed in diagnostic leukemia 
samples were identified based on gene expression levels for 26,825 probes in a subset 
10 of 132 leukemia samples selected from the 360 leukemia samples described above. A 
chi-squared metric followed by permutation test was used to identify discriminating 
genes for the T-ALL, E2A-PBX1, TEL-AML1, BCR-ABL, MLL rearrangement, and 
Hyperdiploid>50 chromosomes. Genes whose expression is limited to a single B-cell 
lineage were also identified, and are provided in Tables 70-74. 
15 Thus, distinct sets of differentially expressed genes that can be used to 

distinguish the T-lineage, hyperdiploid >50 chromosomes, BCR-ABL, E2A-PBX1, 
TEL-AML1, and MLL gene rearrangement risk groups are provided. Examples of 
genes that are differentially expressed in the T-ALL risk group are shown in Tables 7, 
14, 21, 28, 35, 59, and 67. Examples of genes that are differentially expressed in the 
20 E2A-PBX1 risk group are shown in Tables 3, 10, 17, 24, 31, 55, 64, and 71. 

Examples of genes that are differentially expressed in the TEL-AML1 risk group are 
shown in Tables 8, 15, 22, 29, 36, 60, 68, and 74. Examples of genes that are 
differentially expressed in the BCR-ABL risk group are shown in Tables 2, 9, 16, 23, 
30, 54, 63, and 70. Examples of genes that are differentially expressed in the MLL 
25 risk group' are shown in Tables 5, 12, 19, 26, 33, 57, 66, and 73. Examples of genes 
that are differentially expressed in the Hyperdiploid >50 risk group are shown in 
Tables 4, 11, 18, 25, 32, 56, 65, and 72. • 

The present invention further provides a seventh leukemia risk group, herein 
termed "Novel," that can be distinguished from the previously-described leukemia 
30 risk groups based on expression profiling. The expression profiles from subjects in 
the Novel risk group are distinguishable from those of the T-ALL, E2A-PBX1, TEL- 
AML1, BCR-ABL, MLL, and Hyperdiploid >50 risk groups. Subjects assigned to the 
Novel risk group have similar expression profiles. Examples of genes that are 

-7- 



BNSDOCID: <W0 636Mi4f)A5J.> 



WO 03/083140 PCT/US03/08486 

differentially expressed in the Novel leukemia risk group are shown in Tables 4, 11, 
18, 25, 32, and 58. 

Similarly, sets of differentially expressed genes associated with leukemia 
patients in the T-ALL, Hyperdiploid >50, TEL-AML1, MLL, and Other (i.e. not the 
5 T-ALL, hyperdiploid >50, TEL-AML1, MLL, E2A-PBX1, or BCR-ABL) risk groups 
who have undergone relapse were identified. Examples of differentially expressed 
genes associated with relapse in subjects in the T-ALL risk group are shown in Table 
44. Examples of differentially expressed genes associated with relapse in subjects in 
the hyperdiploid >50 risk group are shown in Table 45. Examples of differentially 
10 expressed genes associated with relapse in subjects in the TEL-AML1 risk group are 
shown in Table 46. Examples of differentially expressed genes associated with relapse 
in subjects in the MLL risk group are shown in Table 47. Examples of differentially 
expressed genes associated with relapse in subjects in the E2A-PBX1, BCR-ABL, and 
Novel risk group are shown in Table 48. 
1 5 The invention also provides genes that are differentially expressed in subjects 

affected by TEL-AML1 who have developed secondary (treatment-induced) AML. 
Examples of such genes are shown in Table 52. 

The present invention also reveals genes with a high differential level of 
expression in leukemic compared to normal cells. These highly differentially 
20 expressed genes are selected from the genes shown in Tables 2-36 and 44-48, 63-68, 
and 70-74. These genes and their expression products are useful as markers to detect 
the presence of minimal residual disease (MRD) in a patient. Antibodies or other 
reagents or tools may be used to detect the presence of these telltale markers of MRD. 
The expression profiles of the invention comprise one or more values 
25 representing the expression level of a gene having differential expression in a 
leukemia risk group, in subjects affected by leukemia who will relapse after 
conventional therapy, or in subjects affected by leukemia who will develop secondary 
AML after conventional therapy. Each expression profile contains a sufficient 
number of values such that the profile can be used to distinguish one leukemia risk 
30 group from another, or to distinguish subjects who will relapse after conventional 
therapy from those who will not relapse, or to distinguish subjects who will develop 
secondary AML after conventional therapy from those who will not develop 
secondary AML. In some embodiments, the expression profiles comprise only one 
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value For example, it can be determined whether a subject affected by leukemia is in 
the T-ALL risk group based only on the expression level of the CD3D antigen (SCSI 
Accession No. AA919102; see Table 14). Similarly, it can be determined whether a 
subject affected by leukemia is in the E2A-PBX1 risk group based only on the 
5 e xpressionlevelofthecDNAofNCBIAccessionNo.AL049381 (seeTablelO). In 

other embodiments, the expression profile comprises more than one value 
corresponding to a differentially expressed gene, for example at least 2 values, at least 
3 values, at least 4 values, at least 5 values, at least 6 values, at least 7 values, at least 
8 values, at least 9 values, at least 10 values, at least 11 values, at least 12 values, at 
10 least 13 values, at least 14 values, at least 15 values, at least 16 values, at least 17 
values, at least 18 values, at least 19 values, at least 20 values, at least 22 values, at 
least 25 values, at least 27 values, at least 30 values, at least 35 values , at least 40 
values at least 45 values, at least 50 values, at least 75 values, at least 100 values, at 
least 125 values, at least 150 values, at least 175 values, at least 200 values, at least 
1. 250 values, at least 300 values, at least 400 values, at least 500 values, at least 600 
values, at least 700 values, at least 800 values, at least 900 values, at least 1000 
values, at least 1200 values, at least 1500 values, or at least 2000 or more values. 

It is recognized that the diagnostic accuracy of assigning a subject to a 
leukemia risk group, determining whether a subject has an increased risk for relapse, 
20 or determining whether a subject has an increased risk of developing secondary AML 
will vary based on the number of values contained in the expression profile. 
Generally, the number of values contained in the expression profile is selected such 
that the diagnostic accuracy is at least 85%, at least 87%, at least 90%, at least 91%, at 
least 92%, at least 93%, at least 94%, at least 95%, at least 96%, at least 97%, at least 
25 98%, or at least 99%, as calculated using methods described elsewhere herein, with an 
obvious preference for higher percentages of diagnostic accuracy. 

It is recognized that the diagnostic accuracy of assigning a subject to a 
leukemia risk group, determining whether a subject has an increased risk for relapse, 
or determining whether a subject has an increased risk of developing secondary AML 
30 will vary based on the strength of the correlation between the expression levels of the 
differentially expressed genes and the associated physiologic condition. When the 
values in the expression profiles represent the expression levels of genes whose 
expression is strongly correlated with the physiologic condition, it may be possible to 
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use fewer number of values in the expression profile and still obtain an acceptable 
level of diagnostic or prognostic accuracy. 

The strength of the correlation between the expression level of a differentially 
expressed gene and the presence or absence of a particular physiologic state may be 
5 determined by a statistical test of significance. For example, the chi square test used 
to select genes in some embodiments of the present invention assigns a chi square 
value to each differentially expressed gene, indicating the strength of the correlation 
of the expression of that gene and the presence or absence of the associated 
physiologic condition. Similarly, the T-statistics metric and the Wilkins' metric both 
10 provide a value or score indicative of the strength of the correlation between the 
expression of the gene and the absence or presence of the associated physiologic 
conditions. These scores may be used to select the genes whose expression levels 
have the greatest correlation with a particular physiologic state in order to increase the 
diagnostic or prognostic accuracy of the methods of the invention, or in order to 
15 reduce the number of values contained in the expression profile while maintaining the 
diagnostic or prognostic accuracy of the expression profile. 

For example, in one embodiment the chi square test is used to determine the 
significance of the differentially expressed genes whose expression levels are 
included in the array, and only those genes having a chi square value of more than 20, 
20 more than 25, more than 30, more than 35, more than 40, more than 45, more than 50, 
more than 55, more than 60, more than 65, more than 70, more than 75, more than 80, 
more than 90, more than 100, more than 120, more than 140, more than 160, more 
than 180, or more than 200 are selected. 

In another embodiment, the T-statistics metric is used to determine the 
25 significance of the differentially expressed genes whose expression levels are 

included in the array, and only those genes with a score having an absolute value of 
greater than 4, greater than 5, greater than 6, greater than 7, greater than 8, greater 
than 9, greater than 10, greater than 12, greater than 25, greater than 27, greater than 
30, or greater than 35 are selected. 
30 In yet another embodiment, the Wilkins' metric is used to determine the 

significance of the differentially expressed genes whose expression levels are 
included in the array, and only those genes having a score of greater than 0.55, greater 
than 0.57, greater than 0.59, greater than 0.61, greater than 0.63, greater than 0.65, 
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greater than 0.67, greater than 0.69, greater than 0.71, greater than 0.73, greater than 
0.75, greater than 0.77, greater than 0.79, greater than 0.81, greater than 0.83, or 
greater than 0.85 are selected. 

Each value in the expression profiles of the invention is a measurement i 
representing the absolute or the relative expression level of a differentially expressed 
genes. The expression levels of these genes may be determined by any method 
known in the art for assessing the expression level of an RNA or protein molecule in a 
sample. For example, expression levels of RNA may be monitored using a membrane 
blot (such as used in hybridization analysis such as Northern, Southern, dot, and the 
like), or microwells, sample tubes, gels, beads or fibers (or any solid support 
comprising bound nucleic acids). See U.S. Patent Nos. 5,770,722, 5,874,219, 
5,744,305, 5,677,195 and 5,445,934, which are expressly incorporated herein by 
reference. The gene expression monitoring system may also comprise nucleic acid 

probes in solution. 

In one embodiment of the invention, microarrays are used to measure the 
values to be included in the expression profiles. Microarrays are particularly well 
suited for this purpose because of the reproducibility between different experiments. 
DNA microarrays provide one method for the simultaneous measurement of the 
expression levels of large numbers of genes. Each array consists of a reproducible 
pattern of capture probes attached to a solid support. Labeled RNA or DNA is 
hybridized to complementary probes on the array and then detected by laser scanning. 
Hybridization intensities for each probe on the array are determined and converted to 
a quantitative value representing relative gene expression levels. See, the 
Experimental section. See also, U.S. Pat. Nos. 6,040,138, 5,800,992 and 6,020,135, 
6,033,860, and 6,344,316, which are incorporated herein by reference. High-density 
oligonucleotide arrays are particularly useful for detennining the gene expression 
profile for a large number of RNA's in a sample. 

In one approach, total mRNA isolated from the sample is converted to labeled 
cRNA and then hybridized to an oligonucleotide array. Each sample is hybridized to 
a separate array. Relative transcript levels are calculated by reference to appropriate 
controls present on the array and in the sample. See, for example, the Experimental 
section. 
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In another embodiment, the values in the expression profile are obtained by 
measuring the abundance of the protein products of the differentially-expressed genes. 
The abundance of these protein products can be determined, for example, using 
antibodies specific for the protein products of the differentially-expressed genes. The 
5 term "antibody" as used herein refers to an immunoglobulin molecule or 

immunologically active portion thereof, i.e., an antigen-binding portion. Examples of 
immunologically active portions of immunoglobulin molecules include F(ab) and 
F(ab')2 fragments which can be generated by treating the antibody with an enzyme 
such as pepsin. 

10 The antibody can be a polyclonal, monoclonal, recombinant, e.g., a chimeric 

or humanized, fully human, non-human, e.g., murine, or single chain antibody. In a 
preferred embodiment it has effector function and can fix complement. The antibody 
can be coupled to a toxin or imaging agent. 

A full-length protein product from a differentially-expressed gene, or an 
1 5 antigenic peptide fragment of the protein product can be used as an immunogen. 
Preferred epitopes encompassed by the antigenic peptide are regions of the protein 
product of the differentially expressed gene that are located on the surface of the 
protein, e.g., hydrophilic regions, as well as regions with high antigenicity. The 
antibody can be used to detect the protein product of the differentially expressed gene 
20 in order to evaluate the abundance and pattern of expression of the protein. These 

antibodies can also be used diagnostically to monitor protein levels in tissue as part of 
a clinical testing procedure, e.g., to, for example, determine the efficacy of a given 
therapy. Detection can be facilitated by coupling (i.e., physically linking) the 
antibody to a detectable substance (i.e., antibody labeling). Examples of detectable 
25 substances include various enzymes, prosthetic groups, fluorescent materials, 

luminescent materials, bioluminescent materials, and radioactive materials. Examples 
of suitable enzymes include horseradish peroxidase, alkaline phosphatase, p- 
galactosidase, or acetylcholinesterase; examples of suitable prosthetic group 
complexes include streptavidin/biotin and avidin/biotin; examples of suitable 
30 fluorescent materials include umbelliferone, fluorescein, fluorescein isothiocyanate, 
rhodamine, dichlorotriazinylamine fluorescein, dansyl chloride or phycoerythrin; an 
example of a luminescent material includes luminol; examples of bioluminescent 

-12- 



BNSDOCID: <WO 03083140A2_I_> 



WO 03/0831 40 PCT/USOJ/08486 

materials include luciferase, luciferin, and aequorin, and examples of suitable 

radioactive material include 125 1, 13 1 I, 35 S or 3 H. 

Once the values comprised in the subject expression profile and the reference 
expression profile or expression profiles are established, the subject profile is 
5 compared to the reference profile to determine whether the subject expression profile 
is sufficiently similar to the reference profile. Alternatively, the subject expression 
profile is compared to a plurality of reference expression profiles to select the 
reference expression profile that is most similar to the subject expression profile. 

Any method known in the art for comparing two or more data sets to detect 
10 similarity between them may be used to compare the subject expression profile to the 
reference expression profiles. In some embodiments, the subject expression profile 
and the reference profile are compared using a supervised learning algorithm such as 
the support vector machine (S VM) algorithm, prediction by collective likelihood of 
emerging patterns (PCL) algorithm, the ^-nearest neighbor algorithm, or the Artificial 
15 Neural Network algorithm. Each of these algorithms is described in the Experimental 
section of the application. To determine whether a subject expression profile shows 
"statistically significant similarity" or "sufficient similarity" to a reference profile, 
statistical tests may be performed to determine whether the similarity between the 
subject expression profile and the reference expression profile is likely to have been 
to ^WwH Kv » random event. An example of such a statistical test is the permutation 
test described in the Experimental section; however, any statistical test that can 
calculate the likelihood that the similarity between the subject expression profile and 
the reference profile results from a random event can be used. The accuracy of 
assigning a subject to a risk group based on similarity between an expression profile 
25 for the subject and an expression profile for the risk group depends in part on the 
degree of similarity between the two profiles. Therefore, when more accurate 
diagnoses are required, the stringency with which the similarity between the subject 
expression profile and the reference profile is evaluated should be increased. For 
example, in various embodiments, the p-value obtained when comparing the subject 
30 expression profile to a reference profile that shares sufficient similarity with the 
subject expression profile is less than 0.20, less than 0.15, less than 0.10, less than 
0.09, less than 0.08, less than 0.07, less than 0.06, less than 0.05, less than 0.04, less 
than 0.03, less than 0.02, or less than 0.01. 
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111 some embodiments, the assignment of a subject affected by leukemia to a 
leukemia risk group, the prediction of whether a subject affected by leukemia has an 
increased risk of relapse, or the prediction of whether a subject by affected by 
leukemia has an increased risk of developing secondary AML is used in a method of 

5 choosing a therapy for the subject affected by leukemia. A therapy, as used herein, 
refers to a course of treatment intended to reduce or eliminate the affects or symptoms 
of a disease, in this case leukemia. A therapy regiment will typically comprise, but is 
not limited to, a prescribed dosage of one or more drugs or hematopoietic stem cell 
transplantation. Therapies, ideally, will be beneficial and reduce the disease state but 

1 0 in many instances the effect of a therapy will have non-desirable effects as well. 
Thus, the methods of the invention are useful for monitoring the effectiveness of a 
therapy even when non-desirable side-effects are observed. 

Arrays, Computer-Readable Medium, and Kits 

1 5 The present invention provides compositions that are useful in determining the 

gene expression profile for a subject affected by leukemia and selecting a reference 
profile that is similar to the subject expression profile. These compositions include 
arrays comprising a substrate having a capture probes that can bind specifically to 
nucleic acid molecules that are differentially expressed in leukemia risk groups, 

20 subjects affected by leukemia who will relapse after conventional therapy, or subjects 
affected by leukemia who will develop secondary AML after conventional therapy. 
Also provided is a computer-readable medium having digitally encoded reference 
profiles useful in the methods of the claimed invention. The invention also 
encompasses kits comprising an array of the invention and a computer-readable 

25 medium having digitally-encoded reference profiles with values representing the 

expression of nucleic acid molecules detected by the arrays. These kits are useful for 
assigning a subject affected by leukemia to a leukemia risk group, predicting whether 
a subject affected by leukemia has an increased risk of relapse, and predicting whether 
a subject affected by leukemia has an increased risk of developing secondary AML. 



30 



The present invention provides arrays comprising capture probes for detecting 
the differentially expressed genes of the invention. By "array" is intended a solid 
support or substrate with peptide or nucleic acid probes attached to said support or 
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substrate. Arrays typically comprise a plurality of different nucleic acid or peptide 
capture probes that are coupled to a surface of a substrate in different, known 
locations. These arrays, also described as "microarrays" or colloquially "chips" have 
been generally described in the art, for example, in U.S. Patent. Nos. 5,143,854, 
5,445,934, 5,744,305, 5,677,195, 6,040,193, 5,424,186, 6,329,143, and 6,309,831 and 
Fodor et al. (1991) Science 251:767-77, each of which is incorporated by reference in 
its entirety. These arrays may generally be produced using mechanical synthesis 
methods or light directed synthesis methods which incorporate a combination of 
photolithographic methods and solid phase synthesis methods. 

Techniques for the synthesis of these arrays using mechanical synthesis 
methods are described in, e.g., U.S. Patent No. 5,384,261, incorporated herein by 
reference in its entirety for all purposes. Although a planar array surface is preferred, 
the array may be fabricated on a surface of virtually any shape or even a multiplicity 
of surfaces. Arrays may be peptides or nucleic acids on beads, gels, polymeric 
surfaces, fibers such as fiber optics, glass or any other appropriate substrate, see U.S. 
Pat. Nos. 5,770,358, 5,789,162, 5,708,153, 6,040,193 and 5,800,992, each of which is 
hereby incorporated in its entirety for all purposes. Arrays may be packaged in such a 
manner as to allow for diagnostics or other manipulation of an all-inclusive device. 
See, for example, U.S. Pat. Nos. 5,856,174 and 5,922,591 herein incorporated by 
reference. 

The arrays provided by the present invention comprise capture probes that can 
specifically bind a nucleic acid molecule that is differentially expressed in leukemia 
risk groups, a nucleic acid molecule that is differentially expressed in subjects 
affected by leukemia who will relapse after conventional therapy, or a nucleic acid 
molecule that is differentially expressed in subjects affected by leukemia who will 
develop secondary AML after conventional therapy. These arrays can be used to 
measure the expression levels of nucleic acid molecules to thereby create an 
expression profile for use in methods of determining the diagnosis and prognosis for 
leukemia patients, and for monitoring the efficacy of a therapy in these patients as 
described elsewhere herein. 

In some embodiments, each capture probe in the array detects a nucleic acid 
molecule selected from the nucleic acid molecules designated in Tables 2-36, 44-49, 
52, 54-60, 63-68, and 70-74. The designated nucleic acid molecules include those 
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differentially expressed in leukemia risk groups selected from the T-ALL risk group 
(Tables 7, 14, 21, 28, 35, 59, and 67); E2A-PBX1 risk group (Tables 3, 10, 17, 24, 31, 
55, 64, and 71), TEL-AML1 risk group (Tables 8, 15, 22, 29, 36, and 60, 68, and 74), 
BCR-ABL risk group (Tables 2, 9, 16, 23, 30, 54, 63, and 70), MIX risk group 
5 (Tables 5, 12, 19, 26, 33, 57, 66, and 73), Hyperdiploid >50 risk group (Tables 4, 1 1, 
18, 25, 32, 56, 65, and 72), and Novel risk group (Tables 6, 13, 20, 27, 34, and 58), 
those differentially expressed in subjects affected by leukemia who will relapse after 
conventional therapy (Tables 44-48), and those differentially expressed in subjects 
affected by TEL-AML1 who will develop secondary AML after conventional therapy 
10 (Table 52). 

The arrays of the invention comprise a substrate have a plurality of addresses, 
where each addresses has a capture probe that can specifically bind a target nucleic 
acid molecule. The number of addresses on the substrate varies with the purpose for 
which the array is intended. The arrays may be low-density arrays or high-density 

15 arrays and may contain 4 or more, 8 or more, 12 or more, 16 or more, 20 or more, 24 
or more, 32 or more, 48 or more, 64 or more, 72 or more 80 or more, 96, or more 
addresses, or 192 or more, 288 or more, 384 or more, 768 or more, 1536 or more, 
3072 or more, 6144 or more, 9216 or more, 12288 or more, 15360 or more, or 1 8432 
or more addresses. In some embodiments, the substrate has no more than 12, 24, 48, 

20 96, or 1 92, or 3 84 addresses, no more than 500, 600, 700, 800, or 900 addresses, or no 
more than 1000, 1200, 1600, 2400, or 3600 addressees. 

The invention also provides a computer-readable medium comprising one or 
more digitally-encoded expression profiles, where each profile has one or more values 
representing the expression of a gene that is differentially expressed in a leukemia risk 

25 group, the expression level of a gene that is differentially expressed in subjects 

affected by leukemia who will relapse after conventional therapy, or the expression 
level of a gene that is differentially expressed in subjects affected by leukemia who 
will develop secondary AML after conventional therapy. Such profiles are described 
elsewhere herein. In some embodiments, the digitally-encoded expression profiles are 

30 comprised in a database. See, for example, U.S. Patent No. 6,308, 1 70. 

The present invention also provides kits useful for diagnosing, treating, and 
monitoring the disease state in subjects affected by leukemia. These kits comprise an 
array and a computer readable medium. The array comprises a substrate having 

-16- 



BNSDOCIO: <WO 030831 40A2_I_ 



WO 03/083140 PCT/US03/08486 

addresses, where each address has a capture probe that can specifically bind a nucleic 
acid molecule that is differentially expressed in at least one leukemia risk group, in a 
subject affected by leukemia who will relapse after conventional therapy, or in a 
subject affected by leukemia who will develop secondary AML after conventional 
therapy. The results are converted into a computer-readable medium that has 
digitally-encoded expression profiles containing values representing the expression 
level of a nucleic acid molecule detected by the array. 

Methods of Screening and Therapeutic Targets 

The methods and compositions of the invention may be used to screen test 
compounds to identify therapeutic compounds useful for the treatment of leukemia. 
In one embodiment, the test compounds are screened in a sample comprising primary 
cells or a cell line representative of a particular leukemia risk group. After treatment 
with the test compound, the expression levels in the sample of one or more of the 
differentially-expressed genes of the invention are measured using methods described 
elsewhere herein. Values representing the expression levels of the differentially- 
expressed genes are used to generate a subject expression profile. This subject 
expression profile is then compared to a reference profile associated with the 
leukemia risk group represented by the sample to determine the similarity between the 
subject expression profile and the reference expression profile. Differences between 
the subject expression profile and the reference expression profile may be used to 
determine whether the test compound has anti-leukemogenic activity. 

The test compounds of the present invention can be obtained using any of the 
numerous approaches in combinatorial library methods known in the art, including: 
biological libraries; spatially addressable parallel solid phase or solution phase 
libraries; synthetic library methods requiring deconvolution; the 'one-bead one- 
compound' library method; and synthetic library methods using affinity 
• chromatography selection. The biological library approach is limited to polypeptide 
libraries, while the other four approaches are applicable to polypeptide, non-peptide 
oligomer or small molecule libraries of compounds (Lam (1997) Anticancer Drug 
Des. 12:145). 

Examples of methods for the synthesis of molecular libraries can be found in 
the art, for example in DeWitt et al. (1993) Proc. Natl. Acad. Sci. USA 90:6909; Erb 
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etal. (1994) Proc. Natl. Acad. Sci. USA 91:11422; Zuckermann et al. (1994). J. Med. 
Chem. 37:2678; Cho et al. (1993) Science 261:1303; Carell et al. (1994) Angew. 
Chem. Int. Ed. Engl. 33:2059; Carell et al. (1994) Angew. Chem. Int. Ed. Engl. 
33:2061; and in Gallop et al. (1994) J. Med. Chem. 37:1233. Libraries of compounds 

5 may be presented in solution (e.g., Houghten (1 992) Biotechniques 1 3 :412-421), or on 
beads (Lam (1991) Nature 354:82-84), chips (Fodor (1993) Nature 364:555-556), 
bacteria (U.S. Patent No. 5,223,409), spores (U.S. Patent No. 5,223,409), plasmids 
(Cull et al. (1992) Proc. Natl. Acad. Sci. USA 89:1865-1869) or on phage (Scott and 
Smith (1990) Science 249:3S6-390); (Devlin (1990) Science 249:404-406); (Cwirla et 

10 al. (1990) Proc. Natl. Acad. Sci. U.S.A. 97:6378-6382); (Felici (1991) J. Mol. Biol. 
222:301-310). 

Candidate compounds include, for example, 1) peptides such as soluble peptides, 
including Ig-tailed fusion peptides and members of random peptide libraries (see, e.g., 
Lam et al. (1991) Nature 354:82-84; Houghten et al. (1991) Nature 354:84-86) and 

1 5 combinatorial chemistry-derived molecular libraries made of D- and/or L- configuration 
amino acids; 2) phosphopeptides (e.g., members of random and partially degenerate, 
directed phosphopeptide libraries, see, e.g., Songyang etal. (1993) Cell 72:1 '61 '-778); 3) 
antibodies (e.g., polyclonal, monoclonal, humanized, anti-idiotypic, chimeric, and single 
chain antibodies as well as Fab, F(ab') 2 , Fab expression library fragments, and epitope- 

20 binding fragments of antibodies); 4) small organic and inorganic molecules (e.g., 

molecules obtained from combinatorial and natural product libraries; 5) zinc analogs; 6) 
leukotriene A4 and derivatives; 7) classical aminopeptidase inhibitors and derivatives 
of such inhibitors, such as bestatin and arphamenine A and B and derivatives; 8) and 
artificial peptide substrates and other substrates, such as those disclosed herein above 

25 and derivatives thereof. 

The present invention discloses a number of genes that are differentially 
expressed in leukemia risk groups, in subjects affected by leukemia who will relapse 
after conventional therapy, or in subjects affected by leukemia who will develop 
secondary AML after conventional therapy. These differentially-expressed genes are 

30 shown in Tables 2-36 and 44-48, and 52. Because the expression of these genes is 
associated with leukemia risk factors, these genes may play a role in leukemogenesis. 
Accordingly, these genes and their gene products are potential therapeutic targets that 
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are useful in methods of screening test compounds to identify therapeutic compounds 
for the treatment of leukemia. 

The differentially-expressed genes of the invention may be used in cell-based 
screening assays involving recombinant host cells expressing the differentially- 
5 expressed gene product. The recombinant host cells are then screened to identify 
compounds that can activate the product of the differentially-expressed gene (i.e. 
agonists) or inactivate the product of the differentially-expressed gene (i.e. 
antagonists). 

Any of the leukemogenic functions mediated by the product of the differentially 
10 expressed gene may be used as an endpoint in the screening assay for identifying 

therapeutic compounds for the treatment of leukemia. Such endpoint assays include 
assays for cell proliferation, assays for modulation of the cell cycle, assays for the 
expression of markers indicative of leukemia, and assays for the expression level of 
genes differentially expressed in leukemia risk groups as described above. 
15 Modulators of the activity of a product of a differentially-expressed gene 

identified according to these drug screening assays provided above can be used to treat a 
subject with leukemia. These methods of treatment include the steps of adnnnistering 
the modulators of the activity of a product of a differentially-expressed gene in a 
pharmaceutical composition as described herein, to a subject in need of such treatment. 



20 

The following examples are offered by way of illustration and not by way of 
limitation. 

EXAMPLES 

25 EXAMPLE 1: 

To determine if gene expression profiling of leukemic cells could identify 
known biologic ALL subgroups, 327 diagnostic bone marrow (BM) samples were 
analyzed with AFFYMETRIX® oligonucleotide microarrays (Affymetrix Inc., Santa 
Clara, CA) containing 12,600 probe sets. 

In an initial analysis of the gene expression data set (12,600 probe sets in 327 
leukemia samples; greater than 4 x 10 6 data elements), an unsupervised two- 
dimensional hierarchical clustering algorithm was used to group leukemia samples 
with similar gene expression patterns against clusters of similarly expressed genes. 
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This analysis clearly identified 6 major leukemia subtypes that corresponded to T- 
ALL, hyperdiploid with >50 chromosomes, BCR-ABL, E2A-PBX1, TEL-AML1, and 
MLL gene rearrangement. Moreover, within the heterogeneous collection of 
leukemias that were not assigned to one of these subtypes, a novel subgroup of 14 
5 cases was identified that had a distinct gene expression profile. The separation of 
these seven leukemia subgroups was also seen using the multidimensional scaling 
procedure of discriminant analysis with variance (DAV), in which the data are 
reduced into component dimensions consisting of linear combinations of 
discriminating genes. For example, using the three component dimensions that 

10 accounted for 72.8% of the variance of gene expression among the subgroups, it was 
possible to distinguish T-ALL (43 cases), E2A-PBX1 (27 cases), TEL-AML1 (79 
cases) and hyperdiploid >50 (64 cases) from the remaining ALL subtypes (114 cases). 
Similarly, using three different components that account for an additional 16.1% of 
the variance in gene expression mad it possible to discriminate cases with BCR-ABL 

15 (15 cases), MLL gene rearrangement (20 cases) and the novel subgroup of ALL (14 
cases). 

Statistical methods were used to identify those genes that best define the 
individual groups. Expression profiles were obtained using the top 40 genes per 
subgroup as selected by a Chi square metric. Distinct groups of genes distinguish 

20 cases defined by E2A-PBX1 , MLL, T-ALL, hyperdiploid >50, BCR-ABL, the novel 
subgroup, and TEL-AML1. In addition to these specific subgroups, 65 cases (20% of 
the total) were identified that did not cluster into any of the leukemia subtypes. The 
expression profiles of these latter cases varied markedly, suggesting that they 
represent a heterogeneous group of leukemias. Nearly identical results were obtained 

25 when the hierarchical clustering was performed with genes selected by other 
statistical metrics. 

For T-ALL, two gene clusters that discriminated this subtype from B-lineage 
cases were identified. One cluster was expressed at high and one cluster was 
expressed at low levels. In contrast the top ranked discriminating genes for each of 
30 the other leukemia subtypes consisted primarily of genes that were overexpressed 
within the specific leukemia subtype. With the exception of T-ALL, the identified 
expression profiles do not represent a specific differentiation stage of the leukemic 
blasts. For example, although E2A-PBX1 is almost exclusively found in ALLs with a 
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pre-B cell immunophenotype (Hunger (1996) Blood 87:121 1-24), the identified 
expression profile was specific for the E2A-PBX1 genetic lesion and not the pre-B 
immunophenotype. 

To confirm that the microarray analysis provided an accurate reflection of 
5 actual gene expression levels, the microarray data was compared with results for RNA 
levels obtained by real-time RT-PCR (5 genes). In addition, the corresponding 
protein levels were assessed by immunophenotype analysis performed by flow 
cytometry using nine specific cell surface antigens). A very high degree of 
correlation was observed between the levels of RNA expression detected by 

10 quantitative RT-PCR and microarray analysis. Similarly, in agreement with results 
from immunophenotying, T-lineage restricted RNA expression was observed for 
CD2, CD3, and CD8, whereas B-lineage restricted expression was observed for 
CD19, and CD22. In addition, the level of CD 10 RNA expression closely correlated 
with protein levels, with high expression detected in TEL-AML1 leukemias, 

15 intermediate levels in E2A-PBX1 and low to undetectable expression in cases with 
rearrangements of MLL. Thus, microarray analysis provides an accurate reflection of 
expression levels for most genes, and can be used to accurately detect the expression 
of the more common surface antigens used in the diagnostic evaluation of pediatric 
ALL patients. 

20 The majority of the leukemia subtype specific genes identified through this 

study were not previously known to have a restricted pattern of expression. In 
addition to their use as diagnostic and subclassification markers, these genes provide 
unique insights into the underlying biology of the different leukemia subtypes. For 
example, E2A-PBX1 leukemias were characterized by high expression of the c-Mer 

25 receptor tyrosine kinase (MERTK), a known transforming gene (Graham et al. (1994) 
Cell Growth Differ. 5:647-657); and Georgescu et al (1999) Mol Cell Biol 19:1171- 
8 1), suggesting that C-MER may be involved in the abnormal growth of these cells. 
Similarly, HOXA9 and MEIS1 were exclusively expressed in cases having MLL 
rearrangements, indicating that they may be directly involved in MLL mediated 

30 alterations in the growth of the leukemic cells. Interestingly, high expression of 

MTG16, a homologue of ETO (Gamou et al (1998) Blood 91 :402S-4037), was found 
in TEL-AML1 cases. Alteration of ETO family members in both t(8;21) acute 
myeloid leukemia (by translocation) (Downing (1999) Br. J. Hematol 106:296-308) 
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and TEL-AML1 (by altered expression) suggests that alteration in the biologic 
function of ETO genes is mechanistically involved in these leukemias. 
Little is known about the underlying molecular pathogenesis of hyperdiploid ALL 
>50 chromosomes, which clinically is distinct from hyperdiploid cases having 47-50 
5 chromosomes. This distinction is supported by the marked differences in gene 

expression profiles between these two subgroups. Although hyperdiploid >50 ALLs 
have an excellent prognosis, the specific genetic lesions responsible for the aberrant 
proliferation in these cases remains poorly understood. Interestingly., almost 70% of 
the genes that define this subgroup are localized to either chromosome X or 21. 
10 Moreover, the class defining genes on chromosome X were overexpressed in the 

hyperdiploid >50 chromosomes ALLs irrespective of whether the leukemic blasts had 
a trisomy of this chromosome (data not shown). Detailed analysis will be required to 
determine the specific signaling pathways that are disrupted as a result of the altered 
expression of these genes. Lastly, the novel subgroup of ALL was defined by high 
1 5 expression of a group of genes, including the receptor phosphatase PTPRM, and 

LHFPL2, a gene that is a part of the LHFP-like gene family, the founding member of 
which was identified as the target of a lipoma-associated chromosomal translocation 
(Petit et al. (1999) Genomics 57:438-41). 

Expression Profiling as a Diagnostic Tool 

A major goal of this study was to develop a single platform of expression 
profiling to accurately identify the known, prognostically important leukemia 
subtypes. To this end, computer-assisted learning algorithms were used to develop an 
expression-based leukemia classification. Through a reiterative process of error 
minimization, these algorithms learn to recognize the optimal gene expression 
patterns for a leukemia subtype. Classification was approached using a decision tree 
format, in which the first decision was T-ALL versus B-lineage (non-T-ALL), and 
then within the B-lineage subset, cases were sequentially classified into the known 
risk groups characterized by the presence of E2A-PBX1, TEL-AML1, BCR-ABL, 
MLL chimeric genes, and lastly hyperdiploid with >50 chromosomes. Cases not 
assigned to one of these classes were left unassigned. Classification was performed 
using a Support Vector Machine (SVM) algorithm with a set of discriminating genes 
selected by a correlation-based feature selection (CFS), or if this method selected 
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greater than 20 genes for a particular class, by using the top 20 ranked genes selected 
by a chi-square metric, or one of the other metrics detailed in the Experimental 
Procedures section. This approach resulted in an accurate class prediction in a 
randomly selected training set that consisted of two-thirds of the total cases (215 

5 cases). When this classification model was then applied to a blind test set consisting 
of the remaining 112 samples, an overall accuracy of 96% was achieved for class 
assignment. The number of genes required for optimal class assignment varied 
between classes. A single gene was sufficient to give 100% accuracy for both T-ALL 
and E2A-PBX1, whereas 7-20 genes were required for prediction of the other classes. 

1 0 Only slight differences were observed in the prediction accuracy of individual classes 
when the process was repeated using genes selected by a number of other metrics, 
including T-statistics, a novel metric referred to as Wilkins', or genes selected by a 
combination of self organizing maps (SOM) and DAV. Moreover, nearly identical 
results were obtained when the various sets of selected genes were used in a number 

15 of different supervised learning algorithms, including K-Nearest Neighbor (k-NN), 

Artificial Neural Network (ANN), and prediction by collective likelihood of emerging 
patterns (PCL). 

Four cases initially appeared to be misclassified as TEL-AML1 by gene 
expression analysis since they lacked a detectable chimeric transcript by RT-PCR. 

20 Upon further analysis by FISH, however, one of these cases was shown to have a 
TEL-AML1 fusion, presumably, a variant rearrangement that could not be detected 
with the amplification primers used for the TEL-AML1 RT-PCR assay. In each of 
the three remaining cases, re-examination of the karyotypes revealed translocations 
involving the p arm of chromosome 12. FISH analysis demonstrated that two of these 

25 cases had deletion of one TEL allele, whereas the remaining case had a partial 

deletion of one TEL allele. Thus, the identified expression profiles appear to reflect 
an abnormality of the TEL transcription factor, and may in fact provide a more 
accurate means of identifying a specific leukemia subtype defined by its underlying 
biology. Collectively, these data demonstrate that the single platform of gene 

30 expression profiling can accurately identify the known prognostic subtypes of ALL. 
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Use of Expression Profiles to Identify Patients at High Risk of Treatment Failure 

Relapse and the development of therapy-induced acute myeloid leukemia 
(AML) are the major causes of treatment failure in pediatric ALL. To determine if 
5 expression profiling might further enhance the ability to identify patients who are 
likely to relapse, the expression profiles of the four groups of leukemic samples were 
compared. The groups of samples used for this comparison were: l)diagnostic 
samples of patients that developed hematological relapses (n = 32); (ii) diagnostic 
samples from patients who remained in continuous complete remission (CCR) (n = 

10 201); (iii) diagnostic samples from patients who developed therapy-induced AML (n 
= 16); and (iv) leukemic samples collected at the time of ALL relapse (n = 25). Using 
DAV, distinct gene expression profiles were identified for each of these groups. 

To further assess the predictive power of the different gene expression 
profiles, supervised learning algorithms were used. Because of the overwhelming 

1 5 differences in the expression profiles of the different leukemia subtypes, it was not 
possible to identify a single expression signature that would predict relapse 
irrespective of the genetic subtype. However, within individual leukemic subtypes, 
distinct expression profiles could be defined that predicted relapse. Class assignment 
was performed using a SVM supervised learning algorithm with discriminating genes 

20 selected by CFS, or if this method returned >20 genes, the top 20 genes selected by T- 
statistics. For both the T-lineage and hyperdiploid >50 subgroups, expression profiles 
identified those cases that went on to relapse with an accuracy of 97% and 100%, 
respectively, as assessed by cross validation. Moreover, the predictive accuracy was 
, statistically significant when compared to results from an analysis of 1000 random 

25 permutations of the specific patient data set. Similarly, expression profiles predictive 
of relapse were identified for TEL-AML, MLL, or cases that lacked any of the known 
genetic risk features. Although the predictive accuracy of these latter expression 
profiles was very high as assessed by cross validation, it did not reach statistical 
significance when compared to results from an analysis of 1000 random permutations 

30 of the same patient data set, likely secondary to the limited number of cases. The 
patterns of expression for a combination of genes, rather than expression levels of a 
single gene were found to have the greatest predictive accuracy. Since few known 
risk-stratifying biologic features have been previously identified for either T-ALL or 
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hyperdiploid >50 ALL, the results suggest that the identified expression profiles 
provide independent risk stratifying information. 

A distinct expression profile was identified in the ALL blasts from patients 
who developed therapy-induced AML. Because secondary AML is thought to arise 
5 from a hematopoietic stem cell that is distinct from that giving rise to the primary 
leukemia, it is difficult to understand how the biology of the original ALL blasts 
could predict the risk of developing a therapy-induced complication. However, when 
the accuracy of expression profiling was evaluated in within the TEL- AML 1 
subgroup, a distinct expression signature consisting of 20 genes was defined. This 
1 0 profile identified, with 1 00% accuracy in cross validation, all patients who developed 
secondary AML, with a p value of 0.031 as assessed by comparison to results from an 
analysis of 1000 random permutations of the patient data set. Genes within this 
signature included RSU1, a suppressor of the Ras signaling pathway, and Msh3, a 
mismatch repair enzyme. 

15 

Overview of Experimental Procedures 

A. Tumor Samples 

The diagnosis of ALL was based on the morphologic evaluation of the bone 
marrow and on the pattern of reactivity of the leukemic blasts with a panel of 

20 monoclonal antibodies directed against lineage-associated antigens. A total of 389 

pediatric acute leukemia samples were analyzed in this study, from which high quality 
gene expression data was obtained on 360 (93%). The successfully-analyzed samples 
included 332 diagnostic BM, 3 diagnostic peripheral bloods (PB), and 25 relapsed 
ALL samples from BM or PB. 264 (79%) of the diagnostic ALL BM samples and all 

25 relapse samples were from patients enrolled on St. Jude Children's Research Hospital 
Total Therapy Studies XIHA or XII1B and corresponded to 64% of the patients 
treated on these protocols. The details of these protocols have been previously 
published (Pui et al. (2000) Leukemia 14:2286-94). The remaining samples were 
obtained from patients treated on St. Jude Total Therapy Studies XI, XII, XIV, XV, or 

30 by best clinical management. All protocols and consent forms were approved by the 
hospital's institutional review board, and informed consent was obtained from 
parents, guardians, or patients (as appropriate). The composition of the data sets used 
for the identification of gene expression profiles predictive of specific genetic 
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subtypes, hematological relapse, and risk of developing secondary AML are described 
below. 

B . Gene Expression Profiling 

5 RNA was extracted from cryopreserved mononuclear cell suspensions from 

diagnostic BM aspirates or PB samples using TRIZOL® (Invitrogen Corp., Carlsbad, 
California) according to the manufacturer's instructions, and the RNA integrity was 
assessed by using an Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto, 
CA). cDNA was synthesized using a T-7 linked oligo-dT primer and cRNA was then 

10 synthesized with biotinylated UTP and CTP. The labeled RNA was then fragmented 
and hybridized to HGJJ95 Av2 oligonucleotide arrays (Affymetrix Incorporated, 
Santa Clara, CA) according to the manufacturer's instructions. 

Arrays were scanned using a laser confocal scanner (Agilent) and the 
expression value for each gene was calculated using AFFYMETRIX® Microarray 

1 5 Software version 4.0. The average intensity difference (AID) values were normalized 
across the sample set and minimum quality control standards were established for 
including a sample's hybridization data in the study. 10% of samples were run in 
duplicate to ensure consistency of data acquisition throughout the study. A high level 
of reproducibility was observed between replicate samples, with fewer than 1% of 

20 genes showing a variation in average intensity difference of greater than 2-fold. 

C. Statistical Analysis 

Unsupervised hierarchical clustering, principal component analysis (PCA), 
discriminant analysis with variance (DAV), and self organizing maps (SOM) were 

25 performed using GeneMaths software (version 1.5, Applied Maths, Belgium). Data 
reduction to define the genes most useful in class distinction was perfomied using a 
variety of metrics as detailed below. Genes selected by the various metrics were used 
in supervised learning algorithms to build classifiers that could identify the specific 
genetic or prognostic subgroups. The algorithms used included k-Nearest Neighbors 

30 (k-NN), Support Vector Machine (SVM), prediction by collective likelihood of 

emerging patterns (PCL), an artificial neural network (ANN), and weighted voting. 
Performance of each model was initially assessed by leave-one-out cross validation 
on a randomly selected stratified training set consisting of two-thirds of the total 
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cases. True error rates of the best performing classifiers were then determined using 
the remaining third of the samples as a blinded test group. Details of the individual 
metrics and supervised learning algorithms are described below. 

5 Detailed Experimental Procedures 

A. RNA Extraction, Labeling, Hybridization, and Data analysis 

Mononuclear cell suspensions from diagnostic BM aspirates or peripheral 
blood (PB) samples were prepared from each patient and an aliquot was 
cryopreserved. RNA was extracted using TRIZOL® following the manufacture's 

1 0 recommended protocol as described above. RNA integrity was assessed by 
electrophoresis on the Agilent 2100 Bioanalyzer (Agilent, Palo Alto, CA). 

First and second strand cDNA were synthesized from 5-15 jag of total RNA 
using the Superscript Double-Stranded cDNA Synthesis Kit ((Invitrogen Corp., 
Carlsbad, California) and an oligo-dT 24 -T7 (5*-GGC CAG TGA ATT GTA ATA 

15 CGA CTC ACT ATA GGG AGG CGG-3'; SEQ ID NO:l) primer according to the 
manufacturer's instructions. cRNA was synthesized and labeled with biotinylated 
UTP and CTP by in vitro transcription using the T7 promoter coupled double stranded 
cDNA as template and the T7 RNA Transcript Labeling Kit according the 
manufacturer's instructions (Enzo Diagnostics Inc., Farmingdale NY). Briefly, double 

20 stranded cDNA synthesized from the previous steps was washed twice with 70% 

ethanol and resuspended in 22 jlxI RNase-free water. The cDNA was incubated with 4 
jal of 10X each reaction buffer, ljxl of biotin labeled ribonucleotides, 2\i\ of DTT, ljul 
of RNase inhibitor mix and 2 jal 20X T7 RNA polymerase for 5 hours at 37°C. The 
labeled cRNA was separated from unincorporated ribonucleotides by passing through 

25 a CHROMA SPIN-100 column (Clontech, Palo Alto, CA) and precipitated at -20°C 
for 1 hr to overnight. 

The cRNA pellet was resuspended in 10 jal Rnase-free H2O and 10.0 |j,g was 
fragmented by heat and ion-mediated hydrolysis at 95°C for 35 minutes in 200 mM 
Tris-acetate, pH 8.1, 500 mM KOAc, 150 mM MgOAc. The fragmented cRNA was 

30 hybridized for 1 6 hr at 45°C to HG_U95Av2 AFFYMETRIX® oligonucleotide arrays 

(Affymetrix, Santa Clara, CA) containing 12,600 probe sets from full-length 

annotated genes together with additional probe sets designed to represent EST 

sequences. Arrays were washed at 25°C with 6X SSPE (0.9M NaCl, 60 mM 
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NaH 2 P0 4 , 6 mM EDTA, 0.01% Tween 20) followed by a stringent wash at 50°C with 
100 mM MES, 0.1M NaCl 2 , 0.01% Tween 20. The arrays were then stained with 
phycoerythrin conjugated streptavidin (Molecular Probes, Eugene, OR). 

Arrays were scanned using a laser confocal scanner (Agilent, Palo Alto, CA) 
5 and the expression value for each gene was calculated using AFFYMETRIX® 

Microarxay software (MAS 4.0). The signal intensity for each gene was calculated as 
the average intensity difference (AID), represented by [E(PM - MM)/(number of 
probe pairs)], where PM and MM denote perfect-match and mismatch probes, 
respectively. Expression values were normalized across the sample set by scaling the 
10 average of the fluorescent intensities of all genes on an array to a constant target 
intensity of 2500, then any AID over 45,000 was capped to a value of 45,000. All 
AID's less than 100, including negative values and absent calls were converted to a 
value of 1 . In addition, a variation filter was used to eliminate any probe set in which 
fewer than 1% of the samples had a present call, or if the Max AID - Min AID across 
15 the sample set was less than 100. The average intensity differences for each of the 

remaining genes were analyzed. For some metrics the data was log transformed prior . 
to analysis. The minimum quality control values required for inclusion of a sample's 
hybridization data in the study were 10% or greater present calls, a GAPDH/Actin 
3 75' ratio <5, and use of a scaling factor that was within 3 standard deviations from 
20 the mean of the scaling values of all chips analyzed. 

The average percent present calls for theoverall data set was 29.7%, and for 
each of the genetic subgroups was BCR-ABL (31.1%), E2A-PBX1 (28.9%), Hyper 
>50 (31%), MLL (29.8%), T-ALL (29.1%), TEL-AML1 (28.5%), Novel (30.2%), 
others (31.1%). In addition, each sample had >75% blasts. The average percentage 
25 blasts for the overall data set used to define the genetic subtypes was 93%, and for 
each genetic subtype was BCR-ABL (92%), E2A-PBX1 (96%), Hyper >50 (93%), 
MLL (93%), T-ALL (91%), TEL-AML1 (92%), Novel (95%), and others (94%). 

B Reproducibility of Microarray Data 
30 The reproducibility of the AFFYMETRIX® microarray system was assessed 

by comparing the gene expression profiles of RNA extracted from duplicate 
cryopreserved diagnostic leukemic samples from 23 patients with single RNA 
samples from 13 patients analyzed on two separate arrays. The mean number of 
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probe sets that displayed a 22-fold difference in expression between separately 
extracted but paired RNA samples was 144, and for single RNA samples analyzed on 
two separate occasions was 133. Moreover, very few probe sets were found to have a 
23-fold difference in expression levels between replicate samples. The observed 
5 number of probe sets showing a difference in expression values represents less than 
2% of the total number of probe sets on the microarray, and thus these data suggest 
that the AFFYMETRTX® microarray system has a very high degree of 
reproducibility. 

10 C. Comparison of Expression Profiles from PB and BM leukemia samples 
Matched BM and PB samples that contained SS0% leukemic blasts were 
obtained from 10 patients and the RNA was extracted and assessed by microarray 
analysis. A very high level of correlation was observed between the expression 
profiles of BM and PB, with only 189 probe sets having a greater than a 2-fold 

15 difference in expression. No genes were found to be consistently over- or under- 
expressed in one sample type. These data demonstrate that there are minimal 
differences in the gene expression profiles of leukemic blasts obtained from BM or 
PB, and that diagnostic gene expression profiling is possible on samples obtained 
from the PB. 

20 

D. RT-PCR Results 

Real-time TAQMAN® RT-PCR assays (Applied Biosystems, Foster City, 
CA) were performed to independently determine the level of mRNA for five genes 
that were found by microarray analysis to be predictive of either T-lineage ALL 

25 (CD35, CD3D antigen delta polypeptide TiT3 complex; MAL, mal T-Cell 

differentiation protein; and PRKCQ, protein kinase C theta) or E2A-PBX1 expressing 
ALL (MERTK, c-Mer proto-oncogene tyrosine kinase and KIAA802). The RNA 
samples analyzed included four samples each of E2A-PBX1 and T-ALL, and two 
samples each from the remaining subtypes (BCR-ABL, A4LL, TEL-AML1, 

30 Hyperdiploid >50, Hyperdiploid 47-50, Hypodiploid, Pseudodiploid, and normal). 

Whenever possible, the forward and reverse primers were designed in different exons 
so that DNA contamination would not be a concern. In the case of MAL where this 
was not clear, the RNA was treated for 15 minutes at room temperature with 1 .0 unit 
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of DNase I (Invitrogen Corp., Carlsbad, California) using the Invitrogen protocol to 
remove any contaminating DNA. 

Thirty-three ng of RNA from each sample was reverse transcribed using 
random hexamers and Multiscribe Reverse Transcriptase (Applied Biosystems, Foster 
5 City, CA) in a total volume of 10 pi. Real time PCR was performed on a Applied 
Biosystems PRISM® 7700 Sequence Detection System (Applied Biosystems). All 
probes were labeled at the 5' end with FAM (6-carboxy-fluroescein) and at the 3' end 
with TAMRA (6-carboxy-tetramethyl-rhodamine). 

The PCR reactions were performed in a total volume of 50 |lx1 containing 10 |al 

10 of the reverse transcriptase product, 300 nM each of the forward and reverse primers, 
100 nM of probe, IX master mix and 1 \il of AMPLITAQ GOLD® DNA polymerase 
(Applied Biosystems). Following a 10 minute incubation at 95°C to activate the 
polymerase, samples were denatured at 95°C for 15 seconds, then annealed and 
extended at 60°C for 1 minute, for a total of 40 cycles. The RNA from each sample 

15 was also amplified using primers and probes to RNase P (Applied Biosystems) for use 
in normalization according to the manufacturer's instructions. Negative controls were 
included in each run. Standard curves were generated for T-cell markers and RNase P 
using MOLT4 RNA, a T-cell leukemia cell line, and for the E2A-PBX1 markers and 
RNase P using a leukemia cell line, 697, that contains an E2A-PBX1 fusion. 

20 The expression level of the predictive genes and RNase P were determined in 

each of the 24 ALL samples. A ratio was then calculated by taking the expression 
value for the specific gene and dividing it by the expression level of RNase P in the 
sample. These ratios were then compared to the values obtained from the 
AFFYMETRJX® chip data from the same RNA sample. The raw AFFYMETREX® 

25 chip data were scaled as described and then normalized using the 3'GAPDH value for 
each sample, yielding a normalized ratio. The TAQMAN® results and 
AFFMETRIX® chip ratios were then log transformed and compared. Since the 
markers selected for TAQMAN® analysis were predictors for either E2A-PBX1 or T- 
ALLs, each gene was expected to have four RNA samples with high and 20 samples 

30 with low expression. For each gene evaluated, an average expression value for both 
the TAQMAN® results and AFFYMETRJX® data was calculated for all samples in 
the up-regulated group, and similarly, for the samples in the down-regulated group. 
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E. Comparison of Real-time RT-PCR Data and AFFYMETRIX® Chip Data 

The normalized gene expression ratios for the TAQMAN® data (gene/RNase 
P) and for the AFFYMETRTX® microarray data (AID for a gene/ATD for GAPDH) 
were log transformed and then the average expression values for each gene was 
5 calculated in the four samples in which its expression was expected to be up-regulated 
and separately in the 20 samples in which its expression was expected to be down- 
regulated. For example, for genes that were expected to be up-regulated in T-ALL 
(CD3S, MAL, and PRKCQ), the log expression ratios in the T-ALL samples were 
averaged to give the up regulated values and the log expression ratios of each gene in 
1 0 the non-T- ALL cases were averaged to give the down regulated value. 

In both the TAQMAN® and the microchip array analysis, MERTK and 
KIAA802, were very highly expressed in the diagnostic samples containing E2A- 
PBX1, and expressed at low levels in all of the other samples. Likewise, PRKCQ, 
CD35 , and MAL, showed high levels of expression in T cells by both methodologies 
15 in comparison with non T-cells. The normalized ratios from the TAQMAN® assay 
were plotted against the normalized ratios from the microchip array for both the up- 
regulated and down-regulated genes. The correlation between TAQMAN® results 
and the microchip array results was 70%, indicating that the same pattern of gene 
expression was seen in both analyses. The MERTK was extremely high in two of the 
20 ElArPBXl patient samples by TAQMAN® analysis. Removal of the MERTK gene 
from the analysis resulted in a correlation of 91% between the TAQMAN® results 
and the microchip array results. 

F. Comparison of AFFYMETRIX® Microarray Chip Results and 

25 Immunophenotype Results 

Leukemic blasts at the time of diagnosis were analyzed for expression of 
lineage restricted cell surface antigens using phycoerythrin- or fluorescein 
isothiocyanate-conjugated monoclonal antibodies against CD2, CD3s, CD4, CD5, 
CD7, CDS, CD10, CD19, and CD22 (Becton Dickinson mimunocytometry Systems, 

30 San Jose, CA, USA). Data were obtained using a COULTER® EPICS XL™ 

(Beckman Coulter, Miami, FL), a COULTER® ELITE™ (Beckman Coulter), or a 
BD FACSCalibur™ flow cytometer (Becton Dickinson, San Jose, CA). The 
expression patterns for these antigens were then compared to gene expression patterns 
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for the AFFYMETRIX® chip sites specified for CD2 (1 probe set, 4073S_at), CD3S 
(1 probe set, 38319_at), CD3s(\ probe set, 36277_at), CD3C0- P^be set, 37078_at), 
CD3y{\ probe set, 39226_at), CD4 (5 probe sets, S56_at, 1146_at, 35517_at, 
34003_at, and 37942_at), CD5 (lprobe set, 32953_at), CD 7 (1 probe set, 771_s_at), 
5 CD8a (1 probe set, 40699_at), CD8p (1 probe set, 39239_at), CD10 (1 probe set, 
1389_at), CD19 (2 probe sets, 1096 _g_at and 1116_at), and CD22 (2 probe sets, 
38521_at and 38522_s_at). As a control, the performance of the AFFYMETRIX® 
microarray probe sets were also assessed using RNA isolated from flow soiled single 
positive CD4+ and CDS+ thymocytes, and CD10+/CD19+ bone marrow cells. High 

1 0 RNA expression was observed in T-ALL for the T-lineage restricted genes CD 2, 
CD38, s, and £ CD8a , and CD7, and in B-lineage ALLs for the B-cell restricted 
genes CD19, and CD22. A similar high level of correlation was observed between 
RNA and protein expression for CD 10. The observed low expression levels of T-cell 
restricted genes in B-cell cases, and B-cell restricted genes in T-ALLs, is consistent 

1 5 with the low level of normal contaminating lymphocytes present in the diagnostic 
marrow samples analyzed. 

G. Patient Data Set 

A total of 389 Pediatric acute leukemia samples were analyzed in this study, 

from which high quality gene expression data were obtained on 360 (93%). The 

successfully analyzed samples included: 332 diagnostic bone marrows (BM), 3 

diagnostic peripheral blood samples (PB), and 25 relapse ALL samples from BM or 

PB. 264 (79%) of the diagnostic ALL BM samples and all relapse samples were from 

patients treated on St. Jude Children's Research Hospital Total Therapy Studies XHIA 

or XIHB and correspond to 64% of the patients treated on these protocols. The details 

of these protocols are described in Pui et al, "Risk-adapted treatment for acute 

lymphoblastic leukemia: findings from St. Jude Children's Research Hospital," 

Haematology and Blood Transfusions, 1997, pp 629-37, Springer-Verlag, Berlin and 

in Pui et al. (2000) Leukemia 14:22S6-94. Study XITJA ran from December 20, 1991 

to August 23, 1994 and enrolled 165 patients, whereas Study XJJIB ran from August 

24, 94 to July 27, 1998 and enrolled 247 patients. No patients were lost to follow-up 

during treatment. When the databases were frozen for analysis, 100% and 93% of 

event-free survivors in studies XHIA and XIUB, respectively, had been seen within 12 
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months. The median (minimum, maximum) follow-up of the event-free survivors 
was 8.09 (6.59, 9.94) and 4.52 (2.37, 7.06) years for XIDA and XIIIB, respectively. 
All other samples were obtained from patients treated on St. Jude Total Therapy 
Studies XI, XII, XIV, XV, or by best clinical management. 
5 For the identification of gene expression profiles that predict specific genetic 

subtypes of ALL, 327 diagnostic BM samples were used. The criteria for inclusion in 
this data set were the availability of a cryopreserved diagnostic BM sample containing 
S75% blasts, and complete data from each of the following diagnostic studies: 
morphology, iimnunophenotype, cytogenetics, DNA ploidy, Southern blot for MLL 

1 0 gene rearrangements, and RT-PCR analysis for MLL-AF4, MLL-AF9, E2A-PBX1 , 
TEL-AML1, and BCR-ABL. This final data set includes diagnostic BM samples 
from XV (38), XIV (4), XIIIA (100), XIIIB (161), or from patients treated on one of 
our older protocols or by best clinical management (24). 

The data sets used to identify expression profiles predicative of hematologic 

15 relapse and the development of therapy-induced AML are described in Table 1 . 

Table 1: Patient Database 



Diagnostic samples used for subtype classification (n=327) 







BCR-ABL 


subgroup (n=15) 






Label® 


Protocol 


Outcome % 


Label® 


Protocol Outcome 


BCR-ABL-C1 


T13B 


CCR 


BCR-ABL-#4 


Til 


NA 


BCR-ABL-R1 


T13A 


Heme Relapse 


BCR-ABL-#5 


T12 


NA 


BCR-ABL-R2 


T13A 


Heme Relapse 


BCR-ABL-#6 


Ti2 


NA 


BCR-ABL-R3 


T13B 


Heme Relapse 


BCR-ABL-#7 


T12 


NA 


BCR-ABL- 












Hyperdip-R5 


T13B 


Heme Relapse 


BCR-ABL-#8 


T14 


NA 


BCR-ABL-#1 


T13A 


Censored 


BCR-ABL-#9 


T15 


NA 


BCR-ABL-#2 


T13B 


Censored 


BCR- ABL-Hyperdip-# 1 0 


T12 


NA 


BCR-ABL-#3 


T13B 


Censored 












E2A-PBX1 


r subgroup fn=27) 






E2A-PBX1-C1 


T13A 


CCR 


E2A-PBX1-#1 


Others 


NA 


E2A-PBX1-C2 


T13A 


CCR 


E2A-PBXl-#2 


Others 


NA 


E2A-PBX1-C3 


T13A 


CCR 


E2A-PBXl-#3 


Others 


NA 


E2A-PBX1-C4 


T13A 


CCR 


E2A-PBXl-#4 


Others 


NA 


E2A-PBX1-C5 


T13A 


CCR 


E2A-PBXl-#5 


Others 


NA 


E2A-PBX1 -C6 


T13B 


CCR 


E2A-PBXl-#6 


Others 


NA 


E2A-PBX1-C7 


T13B 


CCR 


E2A-PBXl-#7 


Til 


NA 


E2A-PBX1-C8 


T13B 


CCR 


E2A-PBXl-#8 


Til 


NA 


E2A-PBX1-C9 


T13B 


CCR 


E2A-PBXl-#9 


T12 


NA 


E2A-PBX1-C10 


T13B 


CCR 


E2A-PBX1-#10 


T12 


NA 


E2A-PBX1-C11 


T13B 


CCR 


E2A-PBX1-#11 


T14 


NA 


E2A-PBX1-C12 


T13B 


CCR 


E2A-PBX1-#12 


T15 


NA 
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Jaz A-r 13 X 1 -Kl 


T1 I'D 


Heme Relapse 


E2A-PBX1-#13 


T15 


NA 


F7 A-PBX1 -2M#1 


T13B 


2ikIAML 












HvnerdiD>50 suburouD (n=64) 






T4 vn prH i r»> S O-P 1 


T13A 


PPT? 


Hvt>erdio>50-C33 


T13B 


CCR 


xxypci ixjllj>'^-» vv^a- 


T13A 


L-LK 


^^yp p 


T13B 


CCR 




T13A 


CCR 


Hyperdip>50-C35 


T13B 


CCR 


T-T vnprH i rv> S 0- P4 
xiypci tiip^ j 


T13A 


CCR 


Hyperdip>50-C36 


T13B 


CCR 


T4\rr»<orrliT^">Sn PS 

riypcraip -^ju -v_^j 


X L-jr\ 


CCR 


Hyperdip>50-C37 


T13B 


CCR 




Tt^A 


CCR 


Hyperdip>50-C3S 


T13B 


CCR 


Hyperdip>50-C7 


T1 ^ A 


CCR 


Hyperdip>50-C39 


T13B 


CCR 


Hyperdip>50-C8 


T13A 


CCR 


Hyperdip>3U-C4U 


nri ip 
1 Ud 


PPT? 


Hyperdip>50-C9 


T13A 


CCR 


Hyperdip>50-C41 


1 IjD 


PPR 


Hyperdip>50-C10 


T13A 




yp p 


T13B 


CCR 


Hyperdip>50-Cl 1 


T13A 


PPP 




T13B 


CCR 










Heme 


TJvnprdin>S0-P1 
xxy pci ui|j»^j v/~ x — 


T13A 


CCR 


Hyperdip>50-Rl 


T13A 


Relapse 










Heme 


Hyperdip>50-C13 


T13A 


CCR 


Hyperdip>50-R2 


T13A 


Relapse 










Heme 


Hyperdip>50-C14 


T13A 


CCR 


Hyperdip>50-R3 


T13A 


Relapse 
Heme 


Hyperdip>50-Ci5 


T13B 


CCR 


TT J" ^ /\ T\ A 

Hyperdip>50-R4 


1 1 DD 


T? plancp 
xvciapoc 










Heme 


WA^<*rHirY>^fi-Pl fk 

jj.yperu.ip'-^-) u-v^ 1 u 


T13B 


PPT? 




T13B 


Relapse 


Ww»prf1m> SO-PI 7 


T13B 


PPT? 


H vnerdir>>5 0-2M# 1 


T13A 


2nd AML 


W\mprrHn>SO-P1 8 

■^yp p 


T13B 


CCR 


rlyperaip>D u-zivitfz 


T13B 


2nd AML 


T-f\mprHm>SO-P 1 Q 

-^yp p 


T13B 


CCR 


Hyperdip>50-#1 


T13A 


Censored 


ny uci nip' — *j \j 


T13B 


CCR 


TT J* -^r A -W~\ 

Hyperdip>50-#2 


T13B 


Censored 




T13B 


CCR 


TT J' C f\ JlO 

Hyperdip>50-#3 


Others 


NA 


T4 vnprH in> S 0-P7 9 


T13B 


CCR 


TT _ 1 • — r A JJ. A 

Hyperdip>5 0-#4 


Others 


NA 


Hyperaip>3U-CzJ 


T1 I'D 


CCR 


TT _ .1 " _ *T /\ -LLC 

Hyperdip>5 0-#5 


T12 


NA 


Hyperdip>50-C24 


T13B 


CCR 


Hyperdip>50-#6 


1 ID 




Hyperdip>5 0-C25 


T13B 


CCR 


Hyperdip>50-#7 


Tl S 
1 ij 


NA 


Hyperdip>50-C26 


T13B 


CCR 


Hyperdip>50-#8 


T1 S 

l J. J 


NA 


riyperaip-^j u- 








T15 


NA 


P97-M 


T13B 


CCR 


Hyperdip>5 0-#9 


Hyperdip>50-C28 


T13B 


CCR 


Hyperdip>50-#10 


T15 


NA 


Hyperdip>50-C29 


T13B 


CCR 


Hyperdip>50-#1 1 


T15 


NA 


Hyperdip>50-C30 


1 13J3 


CCR 


Hyperdip>50-#12 


Tl S 
1 u 


NA 


Hyperaip>jU-C3 1 


i JoxJ 


CCR 


Hyperdip>50-#13 


T15 

X 1J 


NA 


Hyperdip>50-C32 


T 1 ion 

T13B 


CCR 


Hyperdip>50-#14 




"MA 






xxyperui 








Hyperdip47-50- 










CCR 


Cl 


T13A 


CCR 


Hyperdip47-50-C13 


T13B 


Hyperdip47-50- 










CCR 


C2 


T13A 


CCR 


Hyperdip47-50-C14~N 


T13B 


Hyperdip47-50- 










CCR 


C3-N 


T13A 


CCR 


Hyperdip47-50-C15 


T13B 


Hyperdip47-50- 










CCR 


C4 


T13A 


CCR 


Hyperdip47-50-C16 


T13B 


Hyperdip47-50- 








T13B 


CCR 


C5 


T13A 


CCR 


Hyperdip47-50-C17 
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Hvnerdir>47-50- 






C6 


T13B 


CCR 








CI 


T13B 


CCR 


HvneTdin47-50- 






cs 


T13B 


CCR 








C9 


T13B 


CCR 


Hyperdip47-50- 






C10 


T13B 


CCR 


Hyperdip47-50- 






Cll 


T13B 


CCR 


Hyperdip47-50- 






C12 


T13B 


CCR 


Hypodip-Cl 


T13A 


CCR 


Hypodip-C2 


T13A 


CCR 


Hypodip-C3 


T13B 


CCR 


Hypodip-C4 


T13B 


CCR 


Hypodip-C5 


T13B 


CCR 



Hyperdip47-50-C18 


T13B 


CCR 


Hyperdip47-50-C19 


T13B 


CCR 


Hyperdip47-50-2M#l 


T13A 


2nd AML 


Hyperdip47-50-#l 


T15 


NA 


Hyperdip47-50-#2 


T15 


NA 


Hyperdip47-50-#3 


T15 


NA 



Hvpodip s ub group (11=9) 



Hypodip-C6 
Hypodip-2M#l 
Hypodip-#l 
Hypodip-#2 



MLL subgroup (n=20) 



T13B 
T13A 
T15 
T15 



MLL-C1 


T13A 


CCR 




MLL-2M#1 


T1 1 A 


MLL-C2 


T13B 


CCR 




MLL-2M#2 


Tn a 
i 1 dJ\ 


MLL-C3 


T13B 


CCR 




MLL-#1 


1 1 DD 






CCR 




MLL-#2 


T13B 


MLL-C5 


T13B 


CCR 




MLL-#3 


Others 


MLL-C6 


T13B 


CCR 




MLL-#4 


Others 


MLL-R1 


T13A 


Heme Relapse 


MLL-#5 


Others 


MLL-R2 


T13A 


Heme Relapse 


MLL-#6 


T12 


MLL-R3 


T13B 


Heme Relapse 


MLL-#7 


T14 


MLL-R4 


T13B 


Heme Relapse 


MLL-#8 


T14 








Normal subgroup <n=18) 




Nonnal-Cl-N 


T13A 


CCR 




Normal-CIO 


T13B 


Normal-C2-N 


T13A 


CCR 




Normal-Cll-N 


T13B 


Normal-C3-N 


T13A 


CCR 




Normal-C12 


T13B 


Normal-C4-N 


T13B 


CCR 




Normal-Rl 


T13A 


Normal-C5 


T13B 


CCR 




Normal-R2-N 


T13B 


Normal-C6 


T13B 


CCR 




Normal-R3 


T13B 


Normal-C7-N 


T13B 


CCR 




Normal-#l 


T13A 


Normal-C8 


T13B 


CCR 




Normal-#2 


T13B 


Normal-C9 


T13B 


CCR 




Normal-#3 


T13B 








PcoiiHnriin subgroup fn=29) 




Pseudodip-Cl 


T13A 


CCR 




Pseudodip-Cl 6-N 


T13B 


Pseudodip-C2-N 


T13A 


CCR 




Pseudodip-Cl 7 


T13B 


Pseudodip-C3 


T13A 


CCR 




Pseudodip-Cl 8 


T13B 


Pseudodip-C4 


T13A 


CCR 




Pseudodip-C19 


T13B 


Pseudodip-C5 


T13A 


CCR 




Pseudodip-Rl-N 


T13A 



CCR 
2nd AML 
NA 
NA 



2nd AML 
2nd AML 
Censored 
Censored 

NA 

NA 

NA 

NA 

NA 

NA 



CCR 

CCR 

CCR 

Heme 
Relapse 

Heme 
Relapse 

Heme 
Relapse 
Censored 
Censored 
Censored 



CCR 
CCR 
CCR 
CCR 
Heme 
Relapse 
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Other 



Pseudodip-C6 


T13A 


CCR 


Pseudodip-#l 


T13B 


Relapse 


Pseudodip-C7 


T13A 


CCR 


Pseudodip-#2 


T13B 


Censored 


Pseudodip-C8 


T13A 


CCR 


Pseudodip-#3 


Others 


NA 


Pseudodip-C9 


T13A 


CCR 


Pseudodip-#4 


Others 


NA 


Pseudodip-ClO 


T13B 


CCR 


Pseudodip-#5 


T15 


NA 


Pseudodip-C 1 1 


T13B 


CCR 


.rScUQOUip-rfO 


Tl 5 


NA 


PcfMirlndin-CM 9 


T13B 


CCR 


Pseudodip-#7 


T15 


NA 


Pseudodip-C 13 


T13B 


CCR 


Pseudodip-#8-N 


T15 


NA 


Pseudodip-C 1 4 


T13B 


CCR 


Pseudodip-#9 


T15 


NA 


Pseudodip-C 1 5 


T13B 


CCR 












T-ALL subffrouo fn=43) 






T-ALL-C1 


T13A 


CCR 


T-ALL-C23 


T13B 


CCR 


T-ALL-C2 


T13A 


CCR 


T-ALL-C24 


T13B 


CCR 


T-ALL-C3 


T13A 


CCR 


T-ALL-C25 


1 IDD 




T-ALL-C4 


T13A 


CCR 


T-ALL-C26 


T13B 


CCR 
Heme 


T-ALL-C5 


T13A 


CCR 


T-ALL-R1 


T13A 


Relapse 
Heme 


T-ALL-C6 


T13A 


CCR 


m ITT T"» 

T-ALL-R2 


T13B 


Relapse 










Heme 


T-ALL-C7 


T13A 




T-AT T -TCI 


T13B 


Relapse 










Heme 


T-ALL-C8 


T13A 


CCR 


T-ALL-R4 




jvc lap be 










Heme 


T-ALL-C9 


T13B 


CCR 


T-ALL-R5 


1 13r> 


xveiapse 










Heme 


T-ALL-C10 


T13B 


CCR 


T-ALL-R6 


1 Ijd 




T-ALL-C1 1 


T13B 


CCR 


T-ALL-2M#1 


JL IjD 


9nH AMT 










Other 


T AT T PT) 




CCR 


T-ALL-#1 


T13B 


Relapse 










Other 


T-ALL-C13 


T13B 


CCR 


T-ALL-#2 


T13B 


Relapse 


T-ALL-C14 


T13B 


CCR 


T-ALL-#4 


T13B 


Censored 


T-ALL-C15 


T13B 


CCR 


T-ALL-#5 


T13B 


Censored 


T-ALL-C16 


T13B 


CCR 


T-ALL-#6 


T15 


NA 


T-ALL-C17 


T13B 


CCR 


T-ALL-#7 


T15 


NA 


T-ALL-C18 


T13B 


CCR 


T-ALL-#8 


1 ID 


IN /A. 


T-AT T -C19 


T13B 


CCR 


T-ALL-#9 


T15 


NA 


T-ALL-C20 


T13B 


CCR 


T-ALL-#10 


T15 


NA 


T-ALL-C21 


T13B 


CCR 


T-ALL-#11 


T15 


NA 


T-ALL-C22 


T13B 


CCR 












TEL-AML1 subffrouo fn=79) 






TEL-AML1-C1 


T13A 


CCR 


TEL-AML 1-C41 


T13B 


CCR 


TEL-AML 1-C2 


T13A 


CCR 


TEL-AML1 -C42 


T13B 


CCR 


TEL-AML 1-C3 


T13A 


CCR 


TEL-AML1-C43 


T13B 


CCR 


TEL-AML1-C4 


T13A 


CCR 


TEL-AML1 -C44 


T13B 


CCR 


TEL-AML 1-C5 


T13A 


CCR 


TEL- AML 1 -C45 


T13B 


CCR 


TEL-AML1-C6 


T13A 


CCR 


TEL-AML 1 -C46 


T13B 


CCR 


TEL-AML1-C7 


T13A 


CCR 


TEL- AML 1 -C47 


T13B 


CCR 


TEL-AML1-CS 


T13A 


CCR 


TEL-AML1-C48 


T13B 


CCR 


TEL-AML 1-C9 


T13A 


CCR 


TEL-AML1 -C49 


T13B 


CCR 


TEL-AML1-C10 


T13A 


CCR 


TEL-AML 1 -C5 0 


T13B 


CCR 
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TEL-AML1-C11 


T13A 


CCR 


TEL-AML 1 -CI 2 


T13A 


CCR 


TT7T A1V/1T 1 PI ^ 


T13A 


CCR 


TEL-AML1-C14 


T13A 


CCR 


TEL-AML1-C15 


T13A 




TEL-AML 1 -CI 6 


T13A 


CCR 


TEL-AML 1 -CI 7 


T13A 


CCR 


TEL-AML1-C18 


T13A 


CCR 


TEL-AML 1 -CI 9 


T13A 


CCR 


TEL-AML1 -C20 


T13A 


CCR 




T13A 


CCR 


TEL-AML1 -C22 


T13A 


CCR 


TEL-AML1-C23 


T13A 


CCR 


TEL- AML 1 -C24 


T13A 


CCR 


TEL-AML1-C25 


T13A 


CCR 


TEL-AML1-C26 


T13A 


CCR 


TEL-AML1-C27 


T13A 


CCR 


TEL-AML1-C2S 


T13A 


CCR 


TEL-AML1-C29 


T13B 


CCR 


TEL- AML 1 - C3 0 T13B 


CCR 


TEL-AML1-C3 1 


T13B 


CCR 


TEL-AML1-C32 


T13B 


CCR 


TEL-AML1-C33 


T13B 


CCR 


TEL- AML 1 - C3 4 


T13B 


CCR 


TEL- AML 1 -C3 5 


T13B 


CCR 


TEL-AML1-C36 


T13B 


CCR 


TEL-AML1 -C37 


T13B 


CCR 


TEL-AML1-C38 


T13B 


CCR 


TEL-AML! -C39 


T13.B 


CCR 


TEL-AML1-C40 


T13B 


CCR 



HPT-'T A A XT 1 1 

TEL-AME l 


T13B 


CCR 


nrrrx aa/TT 1 P^9 
1 EL- AML 1 -Ks D £ 


T13B 


CCR 


rriT»T A TV XT 1 C 1 

TEL- AML 1 -CD 3 


T13R 

X 1JD 


CCR 


TEL- AML 1 -C54 


T13B 


CXK 


TEL-AML1-C55 


T13B 


CCR 


TEL-AML1-C56 


T13B 


CCR 


TEL-AML1-C57 


T13B 


CCR 






Heme 


TEL-AML1-R1 


T13A 


Relapse 






Heme 


TEL-AML1-R2 


T13A 


Relapse 






Heme 


fpPT A A XT 1 T> O 

TEL-AML 1 -KJ 


x 1 jd ' 


"Rplari^e 

IVVlUL/O w 


TEL-AML 1 -2M# 1 


T1 1 A 


0r\r\ AMI 

_ I ILL .rvLVXX-/ 


TEL-AML 1 -2M#2 


T13A 


^% J A A XT 

2nd AML 


TEL-AML 1 -2M#3 


T13A 


2nd AML 


TEL-AML 1 -2M#4 


T13B 


/-»„ j a A XT 

2nd AML 


TEL-AML1-2M#5 


T13B 


2nd AML 








1 hL>- ANIL, L-tti 


X 1JO 


RelaDse 


T»T?T A A X T 1 -MO 

1 EL- AMI-/ 1 -WZ 


X XJ*\. 


Censored 


T»T?T A TV XT 1 44 *1 

I rSJLr- AML 1 -fro 


T13A 


Censored 


TTTT A A AT 1 44A 




Censored 


TT?T A A/TT 1 


T1 S 
X x~» 


NA 


TET A \vfT 1 Hf\ 

1 JbJL- AML 1 -rFO 


X A.J 


NA 


I EL- AML I -ft 1 


T1 S 

X l-J 


NA 


1 bL-AML 1 -7f8 


T1 S 


NA 


rp-PT AAyfT 1 -MO 

1 bL-AML 1 -try 


T1 S 


NA 


rpT7T A A XT 1 44 1 A 

TEL-AML 1 -# 1 U 




NA 


TEL-AML1-#11 


T15 


NA 


TEL-AML1-#12 


T15 


NA 


TEL-AML1-#13 


T15 


NA 


TEL-AML1-#14 


T15 


NA 



@Label key- 
Subtype Name-C# Dx Sample of patient in CCR 
Subtype Name-R# Dx Sample of patient who developed a hematologic 

5 relapse 

Subtype Name-# Dx Sample used for subgroup classification only 

Subtype Name-2M# Dx Sample of patient who later developed 2 n AML 

Subtype Name-N Dx Sample in novel group 

10 # ProtocoI- Protocol that patient was treated on 

% Outcome- 

CCR Continuous complete remission 

Heme Relapse Hematologic relapse 

15 Other Relapse Extramedullar relapse nd 

2nd AML Diagnostic samples of patients who later developed 2 

AML 

Censored Censored due to BM transplant, treated off protocol, or died in CR 
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NA Not applicable, primarily because the patient was not treated on 

Total 13, and thus is excluded from the analysis used to identify gene expression 
profiles predictive of outcome 



H. Diagnostic Samples Used for Prediction of Prognosis 

In addition to the 201 CCR and 27 Heme Relapse cases listed in Table 1, five 
additional relapse cases were also included in the prognostic analysis, giving a total of 
233 cases for this analysis. These additional cases were not included in the subgroup 
10 prediction data set because they did not meet the established criteria for the reasons 
listed below. 

Label Protocol Comment 

BCR-ABL-R4 T13B Did not meet QC criteria because 
contained 70% blasts 

1 5 MLL-R5 Tl 3 A Peripheral Blood Sample (90% blasts) 

Normal-R4 Tl 3B Molecular studies not performed 

T-ALL-R7 Tl 3 A Peripheral Blood Sample (90% blasts) 

T-ALL-R8 Tl 3B Peripheral Blood Sample (90% blasts) 



20 L Diagnostic Samples used for prediction of Secondary AML 

In addition to the 201 CCR and 13 secondary AML cases listed in Table 1, 
three additional diagnostic marrow samples from patients who developed secondary 
AML were also included in the prognostic analysis. This gives a total of 217 cases 
used for this analysis. These additional cases were not included in the diagnostic data 

25 set because they did not meet the established criteria for the reasons listed below. 

Label Protocol Comment 

Hyperdip>50-2M#3 T12 Non Total 1 3 diagnostic sample 

Hypodip-2M#2 T 1 3B No molecular studies performed 

Hypodip-2M#3 T12 Non Total 13 diagnostic sample 

30 

Relapsed Samples (n=25) 

Twenty-five relapse samples were analyzed, 17 samples which were paired to 
the diagnostic samples listed above (Subtype Name-2M#), and 8 additional non- 
paired relapse samples. 

35 
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Detailed Analysis 

A. Hierarchical cluster analysis of diagnostic cases using all genes that passed the 
variation filter 

5 Two-dimensional hierarchical clustering was performed using Pearson 

correlation coefficient and an unweighted pair group method using arithmetic 
averages (GeneMaths, version 1.5). The results of hierarchical clustering of the 327 
diagnostic samples using the 10,991 probe sets that passed the variation filter can be 
viewed at our web site, www.stjuderesearch.org/ALLl. 

10 

B. Methods for gene selection 

Discriminating genes for the various leukemia subtypes were selected using a 
variety of statistical metrics. The individual metrics used and the list of selected probe 
sets and corresponding genes are given below. 

15 

1. Chi-Square 

The Chi square method evaluates each gene individually by measuring the Chi 
square statistics with respect to the classes. The method first discretizes the observed 
expression values of the gene into several intervals using an entropy-based 

20 discretization method 1 . The Chi square statistics of a gene is then calculated as 

X 2 = 2S(Aij - Eij) 2 /Eij, summing over intervals i = l..m and classes j = l..k. Ay is the 
number of samples in the i th interval that are of the j th class. Ey is the expected 
frequency of Ay and is calculated as Eg = Ri * Q/N, where R; is the number of 
samples in the i th interval, Cj is the number of samples in the j th class, and N is the 

25 total number of samples. The genes are then sorted according to their Chi square 
statistics: the larger the Chi square statistics, the more important the gene. The 40 
genes with the highest Chi square statistics in each subtype are listed in Tables 2-8. 
Generally, using anywhere from the top 20 to 40 genes did not result in significant 
differences in subtype prediction accuracy. Therefore, only the top 20 genes in 

30 subtype prediction were used, unless noted otherwise. 
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Table 2. Genes selected by Chi square: BCR-ABL 





iVIiy I1IL.H IA. 

number 


Gene Name 


GeneSymbol 


Reference 
number 


Chi 
square 
value 


Above/ 
Below 
Mean 


1 


1637_at 


mitogen-activated protein kinase- 


MAPKAPK3 


U09578 


62.75 


Above 






activated protein kinase 3 










jL 


1&&^Ci at 
jOuju al 




CCND2 


D13639 


59.79 


Above 


3 


a r\~\ c\ ~ ± 

40196_at 


rlYAzZ proteui 




D88153 


54.79 


Above 


A 


at 


rvrntrunnmcrprif* tvrosine-nrotein 


ABL 


U07563 


54.77 


Above 






kinase ABL gene 










c 
D 


jj / / j__s_at 


oocnncp 8 OT\rvr\tr»cic— fflatpH 
CaoDdoC O dpupuJolo-l t/iaicva 


CASPS 


X98176 


49.70 


Above 






cysteine protease 






48.29 


Above 


6 


1636_g_at 


proto-oncogene tyrosine-protein 


ABL 


U07563 






kinase ABL gene 










7 


41295_at 


GTT1 protein 


GTT1 


AL041780 


42.60 


Above 


8 


37600_at 


extracellular matrix protein 1 


ECM1 


U68186 


42.60 


Above 


9 


37012_at 


capping protein actin filament 


CAPZB 


U03271 


38.46 


Above 






muscle Z-line beta 










1 n 




allrvlcrlvnprrinp Whr^nliate ^vnthase 


AGPS 


Y09443 


38.46 


Above 


11 


1326_at 


caspase 10 apoptosis-related 


CASP10 


U60519 


37.83 


Above 






cysteine protease 










12 


34362_at 


solute carrier family 2 facilitated 


SLC2A5 


M55531 


37.54 


Above 






glucose transporter member 5 








Above 


13 


33150_at 


disrupter of silencing 10 


SAS10 


All 26004 


36.95 


14 


4005 l_at 


TRAM-like protein 


KIAA0057 


D31762 


36.95 


Above 




7Q061 at 


bone marrow stromal cell antigen 


BST2 


D28137 


36.95 


Above 


16 


33172_at 


2 

hypothetical protein FLJ 10849 


FLJ10849 


T75292 


36.95 


Above 


1 7 


^7^QQ at 
_> / Dyy ai 


alHrv-V^tn reductase familv 1 


AKR1C3 


D17793 


36.95 


Above 






member C3 3-alpha 














hydroxysteroid dehydrogenase 














type II 








Above 


1 c 


3 1 /_at 


protease cysteine i leguiiidin 


PRSC1 


D55696 


36.95 


1 Q 


400^ at 


f^ali-innm ^ anniP, 
v^aiLUJiiiii aviuiw 


CNN3 


S80562 


33.94 


Above 


2ft 


"3^ft q at 


tubulin alnha 1 isoform 44 


TUBA1 


HG2259- 


33.32 


Above 








HT2348 






21 


40504_at 


paraoxonase 2 


PON2 


a T?nm Ant 


^1 A(\ 

D 1 .**0 


Above 


22 


38578_at 


rumor neciosis iacioi ict^cpiui 


TMFRSF7 

1 111 X\xjl. 1 


M63928 


30.47 


Above 






superfamily member 7 










23 


39044_s_at 


diacylglycerol kinase delta 130kD 


DGKD 


D73409 


29.59 


Below 


24 


36634_at 


BTG family member 2 


BTG2 


U72649 


29.16 


Below 


25 


38119_at 


glycophorin C Gerbich blood 


GYPC 


X12496 


29.16 


Above 


26 32562 at 


group 

endoglin Osler-Rendu- Weber 


ENG 


X72012 


27.96 


Above 






syndrome 1 






27.70 


Below 


27 33228_g_at 


interleukin 10 receptor beta 


IL10RB 


AI984234 


28 


37006_at 


step II splicing factor SLU7 


SLU7 


AI660656 


27.15 


Above 
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29 38641_at 

30 38220_at 

31 1211„s_at 

32 39730__at 

33 3659 l_at 

34 36035_at 

35 9S0__at 

36 671_at 

37 40698 at 



38 39330_s_at 

39 1983_at 

40 2001_g_at 



Homo sapiens mRNA for TSC-22- 
like protein 

dihydropyriniidine dehydrogenase DPYD 

CASP2 and RIPK1 domain CRADD 

containing adaptor with death 

domain 

v-abl Abelson murine leukemia ABL1 
viral oncogene homolog 1 

tubulin alpha 1 testis specific TUBA1 
anchor attachment protein 1 Gaalp GPAA1 
yeast homolog 

Niemann-Pick disease type CI NPC1 
secreted protein acidic cysteine- SPARC 
rich osteonectin 

C-type calcium dependent CLECSF2 
carbohydrate-recognition domain 
lectin superfamily member 2 
activation-induced 

actinin alpha 1 ACTN1 
cyclin D2 CCND2 
ataxia telangiectasia mutated ATM 



PCT/LS03/08486 
AJ133115 27.15 Above 



U20938 
U8438S 

X16416 



X96719 



M95178 
X68452 
U26455 



27.15 Above 
26.46 Above 

25.90 Above 



X06956 25.90 Above 

AB002135 25.34 Above 

AF002020 25.29 Above 

J03040 25.29 Above 



23.80 Above 



23.70 Above 
23.70 Above 
22.60 Above 





Affymetrix 


Gene Name GeneSymbol 


Reference 


Chi 


Above/ 




number 




number 


square 


Below 








value 


Mean 


1 


41146_at 


ADP-ribosyltransferase NAD poly ADPRT 


J03473 


187.00 


Above 






ADP-ribose polymerase 








2 


1287_at 


ADP-ribosyltransferase NAD poly ADPRT 


J03473 


187.00 


Above 






ADP-ribose polymerase 








3 


32063_at 


pre-B-cell leukemia transcription PBX1 


M86546 


187.00 


Above 






factor 1 






Above 


4 


33355_at 


Homo sapiens cDNA FLJ12900 PBX1 


AL049381 


187.00 






fis clone NT2RP2004321 (by 












CELERA serach of target 












sequence = PBX1) 




187.00 


Above 


5 


430_at 


nucleoside phosphorylase NP 


X00737 


6 


40454_at 


FAT tumor suppressor Drosophila FAT 


X87241 


176.11 


Above 






homolog 




164.28 


Above 


7 


753_at 


nidogen 2 NID2 


D86425 


8 


33821_at 


Human DNA sequence from clone HELOl 


AL034374 


155.00 


Above 






RP3-483K16 on chromosome 












6pl2.1-21.1 






Above 


9 


39614_at 


KIAA0802 protein KIAA0802 


AB018345 


153.46 


10 


38340_at 


huntingtin interacting protein- 1 - KIAA0655 


AB014555 


143.85 


Above 






related 








11 


1786_at 


c-mer proto-oncogene tyrosine MERTK 


U08023 


142.34 


Above 






kinase 






Above 


12 


39929_at 


KIAA0922 protein KIAA0922 


AB023139 


139.97 
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13 39379_at Homo sapiens mRNA cDNA 

DKFZp586C1019 from clone 
DKFZp586C1019 

14 717_at GS3955 protein 

15 3 62_at protein kinase C zeta 

16 3351 3_at signaling lymphocytic activation 

molecule 

17 37225_at KIAAO 172 protein 

18 854_at B lymphoid tyrosine kinase 

19 3 5 974_at lymphoid-res tricted membrane 

protein 

20 36452_at synaptopodin 

2 1 40648_at c-mer proto-oncogene tyrosine 

kinase 

22 38393_at KIAA0247 gene product 

23 3 S994_at STAT induced STAT inhibitor-2 

24 34 86 l_at golgi autoantigen golgin subfamily 

a 3 

25 38748_at adenosine deaminase RNA- 

specific Bl homolog of rat RED1 

26 401 13_at GS3955 protein 

27 361 79_at mitogen-activated protein kinase- 

activated protein kinase 2 

28 37493_at colony stimulating factor 2 

receptor beta low-affinity 
granulocyte-macrophage 

29 578 at Human recombination acitivating 

protein (RAG2) gene 

30 41017_at myosin-binding protein H 

3 1 37625_at interferon regulatory factor 4 

32 38679_g_at small nuclear ribonucleoprotein 

polypeptide E 

33 1389_at membrane metallo-endopeptidase 

neutral endopeptidase 
enkephalinase CALLA CD 10 

34 347S3_s_at BUB3 budding umnhibited by 

benzimidazoles 3 yeast homolog 

35 36959_at ubiquitin-conjugating enzyme E2 

variant 1 

36 39864_at cold inducible RNA-binding 

protein 

37 4 1 S62_at KIAA005 6 protein 

38 4 1425_at Friend leukemia virus integration 

1 

39 37177_at CD58 antigen lymphocyte 

function-associated antigen 3 

40 37485_at fatty-acid-Coenzyrne A ligase ver 

long-chain 1 





AL049397 


139.49 


Above 


GS3955 


D87119 


135.24 


Above 


PRKCZ 


Z15108 


131.36 


Above 


SLAM 


U33017 


131.36 


Above 


KIAA0172 


D79994 


131.36 


Above 


BLK 


S76617 


130.95 


Above 


LRMP 


U10485 


123.33 


Above 


KIAA1029 


AB028952 


123.33 


Above 


MERTK 


U08023 


120.51 


Above 


KIAA0247 


D87434 


120.51 


Above 


STATI2 


AF037989 


118.58 


Below 






1 16 80 


Above 


AD ARB 1 


U76421 


11/1 1 1 
1 14.13 


Above 


GS3955 


D87119 


114.13 


Above 


MAPKAPK2 


U12779 


113.43 


Above 


CSF2RB 


H04668 


113.04 


Above 


RAG2 


M94633 


111.32 


Above 


MYBPH 


U27266 


109.73 


Above 


IRF4 


U52682 


1 AO C 1 

108.51 


Above 


SNRPE 


AA733050 


106.02 


Above 


MME 


J03779 


105.65 


Below 


BUB 3 


AF047473 


103.87 


Above 


UBE2V1 


T T A OT70 

U49278 


103.5 / 


Above 


CIRBP 


D78134 


99.76 


Below 


KIAA0056 


D29954 


99.76 


Above 


FLU 


M98833 


96.47 


Above 


CD58 


Y00636 


93.84 


Above 


f FACVL1 


D8830S 


93.17 


Above 
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Affynietrix 
number 

1 36620 at 



2 37350_at 

3 171__at 

4 37677_at 

5 41724_at 

6 32207_at 

7 3873S_at 

8 40480_s_at 

9 3S518_at 

10 41132_r_at 

11 31492_at 

12 38317_at 

13 40998 jit 

14 35688_g_at 

15 40903_at 

16 36489_at 

17 1520_s_at 

18 35939_A_at 

19 38604_at 

20 31863_at 

21 890_at 

22 39402__at 

23 41490_at 

24 34753_at 

25 40891JLat 

26 306 s at 



Table 4: Genes selected by Chi square for Hyperdiploid >50 



Gene Name 



SMT3H1 



superoxide dismutase 1 soluble 
amyotrophic lateral sclerosis 1 
adult 

Human DNA sequence from clone PSMD10 
889N15 on chromosome Xq22.1- 
22.3.. 

von Hippel-Lindau binding protein VBP1 
1 

phosphoglycerate kinase 1 PGK1 
accessory proteins BAP3 1/BAP29 DXS1357E 
membrane protein palmitoylated 1 MPP1 
55kD 

SMT3 suppressor of mif two 3 
yeast homolog 1 

FYN oncogene related to SRC 
FGRYES 

sex comb on midleg Drosophila 
like 2 

heterogeneous nuclear 
ribonucleoprotein H2 H 
muscle specific gene 
transcription elongation factor A 
SII like 1 

trinucleotide repeat containing 1 1 
THR-associated protein 230 kDa 
subunit 

mature T-celi pi olifer ation 1 
ATPase H transporting lysosomal 
vacuolar proton pump membrane 
sector associated protein M8-9 
phosphoribosyl pyrophosphate 
synthetase 1 
mterleukin 1 beta 
POU domain class 4 transcription 
factor 1 

neuropeptide Y 
KIAA0179 protein 
ubiquitin-conjugating enzyme 
E2A RAD6 homolog 

interleukin 1 beta 
phosphoribosyl pyrophosphate 
synthetase 2 
synaptobrevin-like 1 

DNA segment on chromosome X 
unique 9879 expressed sequence 
high-mobility group nonhistone 
chromosomal protein 14 

-43- 



GeneSymbol Reference Chi Above/ 



SOD1 



number 


square 


Below 




value 


Mean 


X02317 


52.43 


Above 


AL031177 


4o. /I 


Above 


U56833 


45.80 


Above 


V00572 


45.80 


Above 


X81109 


45.58 


Above 


M64925 


44.07 


Above 


X995S4 


43.57 


Above 





M14333 


43.57 


Above 


SCML2 


Y 18004 


43.20 


Above 


HNRPH2 


U01923 


43.15 


Above 


M9 


AB019392 


43.01 


Below 


TCEAL1 


M99701 


41.10 


Above 


TNRC11 


AF071309 


40.88 


Above 




Z24459 


40.52 


Above 


APT6M8-9 


AL049929 


40.33 


Above 


PRPS1 


D00860 


40.33 


Above 


IL1B 


X04500 


40.29 


Above 


POU4F1 


L20433 


38.74 


Above 


NPY 


AI198311 


38.26 


Above 


KIAA0179 


D80001 


38.26 


Above 


UBE2A 


M74524 


37.99 


Above 


IL1B 


M15330 


37.92 


Above 


PRPS2 


Y00971 


37.72 


Above 


SYBL1 


X92396 


37.72 


Above 


DXS9879E 


X92896 


37.15 


Above 


HMG14 


J02621 


37.15 


Above 



BNSUUCIDI 4WO 030e3l40A2_l_J 



WO 03/083140 



PCT/US03/0S486 



27 37640_at 

28 34S29_at 

29 36169_at 

30 38968_at 

31 36128_at 

32 37014_at 

33 34374_g_at 

34 36542_at 

35 688_at 

36 955 at 



hypoxantliine 

phosphoribosyltransferase 1 
Lesch-Nyhan syndrome 
dyskeratosis congenita 1 dyskerin 

NADH dehydrogenase ubiquinone 
1 alpha subcomplex 1 7.5kD 
MWFE 

SH3 -domain binding protein 5 
BTK-associated 

transmembrane trafficking protein 

myxovirus influenza resistance 1 
homolog of murine interferon- 
inducible protein p78 
upstream regulatory element 
binding protein 1 
solute carrier family 9 
sodium/hydrogen exchanger 
isoform 6 

proteasome prosome macropain 
26S subunit ATPase 1 

calmodulin type I 



37 3581 6_at cystatin B stefin B 

38 3S459_g_at Human cytochrome b5 (CYB 5) 

gene 

39 41 288_at matrix Gla protein 

40 32251_at hypothetical protein FLJ2 11 74 



HPRT1 

DKC1 
NDUFA1 

SH3BP5 

TMP21 
MX1 

UREB1 
SLC9A6 

PSMC1 



CSTB 
CYB5 

MGP 
FLJ21174 



M31642 


37.15 


Above 


U59151 


36.48 


Above 


N47307 


36.48 


Above 






Ahnve 

AUUVv 


L40397 


35.88 


Above 


M33882 


35.65 


Above 


Z97054 


35.55 


Above 


AF030409 


35.55 


Above 


L02426 


35.55 


Above 


HG1862- 


35.55 


Above 


HT1897 






U46692 


35.27 


Above 


L39945 


35.18 


Above 


AL036744 


35.18 


Above 


AA149307 


35.14 


Above 



Table 5: Genes selected by Chi square for MLL 





Affymetrix 


Gene Name 


GeneSymbol 


Reference 


Chi 


Above/ 




number 






number 


square 
value 


Below 
Mean 


1 


34306_at 


muscleblind Drosophila like 


MBNL 


AB007888 


64.07 


Above 


2 


40797_at 


a disintegrin and 
metalloproteinase domain 10 


ADAM 10 


AF009615 


62.85 


Above 


3 


33412_at 


LGALS1 Lectin, galactoside- 
binding, soluble, 1 


LGALS1 


AI535946 


57.97 


Above 


4 


39338_at 


SI 00 calcium-binding protein 
A10 amiexin II ligand calpactin 


S100A10 


AI201310 


57.97 


Above 






I light polypeptide pi 1 






55.22 


Above 


5 


2062_at 


insulin-like growth factor 


IGFBP7 


L19182 






binding protein 7 








Above 


6 


32193_at 


plexin CI 


PLXNC1 


AF030339 


53.59 


7 


40518_at 


protein tyrosine phosphatase 


PTPRC 


Y00062 


53.40 


Above 






receptor type C 








Above 


8 


36777_at 


DNA segment on chromosome 
12 unique 2489 expressed 


D12S2489E 


AJ001687 


51.47 






sequence 






50.73 


Below 


9 


32207__at 


membrane protein palmitoylated MPP1 


M64925 






1 55kD 










10 


33859_at 


sin3 -associated polypeptide 
18kD 


SAP18 


U96915 


50.48 


Above 
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11 


38391_at 


capping protein actin filament 


CAPG 


M94345 


50.26 


Above 


12 


40763_at 


geisoim-iiivc 

Meisl mouse homolog 


MEIS1 


U85707 


50.26 


Above 


13 


1126_s_at 


cell surface glycoprotein CD44 


CD44 


L05424 


50.17 


Above 


14 


3472 l_at 


gene 

FK506-binding protein 5 


FKBP5 


U42031 


50.17 


Above 


15 


37809_at 


homeo box A9 


HOXA9 


U41813 


50.17 


Above 


16 


34861_at 


golgi autoantigen golgin 


GOLGA3 


D63997 


47.58 


Below 


17 


38194_s_at 


subfamily a 3 

immunoglobulin kappa constant 


IGKC 


M63438 


46.18 


Below 


18 


657_at 


protocadherin gamma subfamily PCDHGC3 


T 1 1 IH'X 


46 05 


Above 


19 


36918_at 


guanylate cyclase 1 soluble 


GUCY1A3 


Y15723 


43.90 


Above 


20 


32215_i_at 


aipna d 

KIAA0878 protein 


KIAA0878 


AB020685 


43.90 


Above 


21 


38160_at 


lymphocyte antigen 75 


LY75 


AF011333 


43.90 


Above 


22 


38413_at 


defender against cell death l 




Dl S057 

LJ 1 J \J J / 


43.90 


Above 


23 


1389_at 


membrane metallo- 
endopeptidase neutral 
endopeptidase enkephalinase 


MME 


lf\1 r 7 r 7Q 

ju3 / /y 


A1 89 


"Rplnw 






CALL A CD 10 








Below 


24 


34168_at 


deoxynucleotidyltransferase 


DNTT 


A/T 1 1 OOO 
IVl 1 1 / 






terminal 






4^ 55 


Above 


25 


2036_s_at 


C~*Y}AA finticrpn hominc firnction 


CD44 








and Indian blood group system 






42.55 


Above 




40522 at 


glutamate-ammonia ligase 


GLUL 


X59834 


27 


854_at 


glutamine synthase 

B lymphoid tyrosine kinase 


BLK 


b/ool / 


AO *\A 




28 


40067_at 


E74-like factor 1 ets domain 
transcription factor 


ELF1 


M82882 


AC\ 


Above 


29 


3975o_g_at 


X-box binding protein 1 


XBP1 


Z93930 


39.95 


Below 


30 


36940_at 


TGFB1 -induced anti-apoptotic 


TIAF1 


D86970 


39.82 


Below 






factor 1 






38.77 


Above 


31 


36935_at 


RAS p21 protein activator 
GTPase activating protein 1 


RASA1 


M23379 


32 


32134_at 


testin 


DKFZP586 
B2022 


AL050162 


38.77 


Above 


33 


39379__at 


Homo sapiens mRNA cDNA 
DKFZp586C1019 from clone 




AL049397 


38.77 


Above 






DKFZp586C1019 






ao AA 


AUOVw 


34 


40493_at 


Human cell surface glycoprotein CD44 


T f\^A OA 






CD44 








Above 


35 


769_s_at 


annexin A2 


ANXA2 


JJUUUl / 


on fii 

D / .Ol 


JO 


4U4iD_at 


acetyl-Coenzyme A 
acyltransferase 1 peroxisomal 3 


ACAA1 


X14813 


37.55 


Above 






oxoacyl-Coenzyme A thiolase 




AC004528 


37.55 


Above 


37 


35983 at 


hypothetical protein R32184_l 


R32 184_1 


38 


40519_at 


protein tyrosine phosphatase 


PTPRC 


Y00638 


36.56 


Above 


39 


794_at 


receptor type C 

protein tyrosine phosphatase 

non-receptor type 6 


PTPN6 


X62055 


36.56 


Above 


40 


41234_at 


DnaJ Hsp40 homolog subfamily DNAJB6 


AI540318 


36.56 


Above 



B member 6 
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Table 6: Genes selected by Chi square for Novel risk group 








Affymetrix 


Gene Name 


GeneSymbol 


Reference 


Chi 


Above/ 




number 






number 


square 


Below 










value 


Mean 


1 


37960_at 


carbohydrate chondroitin 


CHST2 


ABO 14679 


175.82 


Above 






6/keratan sulfotransferase 2 








Above 


2 


3l892_at 


protein tyrosine phosphatase 


TVT*T>T*I TV If 

PTPRM 


"V C MOO 

X58288 


172.85 






receptor type 








Above 




994 at 


protein tyrosine phosphatase 


PTPRM 


X582S8 


172.85 






receptor type M 










4 


995_g_at 


protein tyrosine phosphatase 


PTPRM 


X58288 


172.85 


Above 






Yf*Ce*Y\\(W tA/l*\f* TV/1" 
ICL-CJJIUI Ly^JG 1VX 










D 


4 1 u / 4_at 


G protein-coupled receptor 49 


GPR49 


AF062006 


139.36 


Above 


6 


41073 at 


VJ prOlCLLl-L/ULipiCU. ICCCpiVJl *t.-7 


GPR49 


AI74^745 


139.36 


Above 


7 . 


34676_at 


-rr-r A A 1 AAA — j • 

KIAA1099 protein 


vi a a i noo 
JsJLAAlUyy 


/vtsuzyuzz 


1 ^7 71 
ID / . / 1 


/vuuvc 


8 


36139_at 


DKFZP586G0522 protern 


JJKr ZjJt j ooUlD 
22 


a t n^noco 
/VLiouzsy 


127.05 


Above 


9 


37542 at 


lipoma HMGIC fusion partner- 


LHFPL2 


D86961 


120.79 


Above 






lilrp. 0 
IIKC Z 










10 


41159 at 


dathrin heavv "DolvoeDtide He 


CLTC 


D21260 


115.15 


Above 


11 


4UUo l_at 


phospholipid transfer protein 


PLTP 


L26232 


108.33 


Above 


1 o 
iz 


oZouu_at 


PTiiman retinnid "X" recentor 


RXR 


U66306 


107.39 


Above 






alnha mRNA V TTTR nartial 




















107.39 


Above 


13 


36906_at 


r"inn^KiriAin T*£»r* pntnr 1 rvrjun 
CaillldUlllUIU. ICvCpiUl 1 UIa.111 


CNR1 


U73304 


14 


39878_at 


protocadherin 9 




A K9A1 
/VLDZ^IZD 




Above 


15 


41747_s_at 


Human myocyte-specific 




t T/ionon 

U4VUZU 


99.20 


Above 






enDanccr idcior z/\ ^ivi_L>r .^..rv. j 














gene, last coding exon, and 














complete cds. 








Above 


16 


33410_at 


integrin alpha 6 


ITGA6 


S66213 


96.17 


17 


34947 at 


pnorDoim-iiice proteni iviiJouiy 


1VLL/OU 1 " 




93.59 


Above 


18 


36029__at 


chromosome 11 open reading 


CI lOKro 






Above 






frame 8 










19 


41708_at 


KIAA1034 protein 


KIAA1034 


AB028957 


92.60 


Above 


20 


1664_at 


insulin-like growth factor 2 


IGF2 


HG3543- 


92.60 


Above 








HT3739 






21 


32736__at 


HSPC022 protein 


HSPC022 


W68830 


91.62 


Below 


22 


41266_at 


integrin alpha 6 


ITGA6 


X53586 


86.95 


Above 


23 


36566_at 


cystinosis nephropathic 


CTNS 


AJ222967 


82.89 


Above 


24 


1825_at 


IQ motif containing GTPase 


IQGAP1 


L33075 


81.20 


Below 






activating protein 1 










25 


1731_at 


nlatplpt-Hprivpd orowth factor 


PDGFRA 


M21574 


78.22 


Above 






receptor alpha polypeptide 










26 


37023_at 


lymphocyte cytosolic protein 1 


LCP1 


J02923 


78,22 


Below 






L-plastin 








Above 


Z / 


55 KJ5 /_ai 


carbohydrate N- 


CHST7 


AL0221 65 


76.00 






acetylglucosamine 6-0 














sulfotransferase 7 










28 


33411_g_at 


integrin alpha 6 


ITGA6 


S66213 


75.47 


Above 


29 


538_at 


CD34 antigen 


CD34 


S53911 


74.86 


Above 
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30 39108_at 

31 38364_at 

32 40423_at 

33 35192_at 

34 39037 at 



35 38747_at 

36 37687_i_at 

37 1857_at 

38 3861 S_at 

39 31782_at 

40 32842 at 



lauosterol synthase 2 3- 

oxidosqualene-lanosterol 

cyclase 

BCE-1 protein 

KIAA0903 protein 

glycine dehydrogenase 
decarboxylating glycine 
decarboxylase glycine cleavage 
system protein P 
myeloid/lymphoid or mixed- 
lineage leukemia trithorax 
Drosophila homolog 
translocated to 2 
Human CD34 gene, exon 8. 

Fc fragment of IgG low affinity 
Ha receptor for CD32 
MAD mothers against 
decapentaplegic Drosophila 
homolog 7 

Human PAC clone RP3-5 15N1 
from 22qll.2-q22 
prostaglandin D2 receptor DP 
B-cell CLL/lymphoma 7A 



LSS 

BCE-1 

KIAA0903 

GLDC 

MLLT2 



CD34 
FCGR2A 

MADH7 

LIMK2 

PTGDR 
BCL7A 



U22526 

AF068197 
AB020710 
D90239 

L13773 

M81945 
M31932 

AF010193 

AC002073 

U31099 
X899S4 



PCT/US03/08486 
71.90 Above 

71.90 Above 
71.29 Above 
71.29 Above 

71.29 Above 



69.45 Above 

67.75 Above 

66.28 Above 

64.03 Above 

61.92 Above 

61.57 Above 



Table 7. Genes selected for Chi square for T-ALL 





Affymetrix 
number 


Gene Name 


GeneSymbol 


Reference 
number 


Chi 
square 


Above/ 
Below 










value 


Mean 


1 


38319__at 


CD3D antigen delta polypeptide CD3D 


AA919102 


215.00 


Above 






TiT3 complex 










2 


1096_ g at 


CD 19 antigen 


CD19 


M28170 


206.48 


Below 


3 


38242_at 


B cell linker protein 


SLP65 


AF068180 


198.52 


Below 


4 


32794_g__at 


T cell receptor beta locus 


TRB 


X00437 


197.71 


Above 


5 


37988_at 


CD79B antigen 

immunoglobulin-associated beta 


CD79B 


M89957 


197.71 


Below 


6 


38017_at 


CD79A antigen 
immunoglobulin-associated 


CD79A 


U05259 


197.53 


Below 


7 


35016_at 


alpha 

Human la-associated invariant 
gamma-chain gene, exon 8, 


M13560 


M13560 




Below 






clones lambda-y(l,2,3). 






197.53 


Above 


8 


36277_at 


Human membran protein (CD3- 
epsilon) gene, exon 9. 


CD3E 


M23323 


9 


38095_i_at 


major histocompatibility 
complex class II DP beta 1 


HLA-DPB1 


MS3664 


191.09 


Below 


10 


39318_at 


T-cell leukemia/lymphoma 1A 


TCL1A 


X82240 


189.78 


Below 


11 


38147_at 


SH2 domain protein 1 A Duncan SH2D1A 


AL023657 


189.78 


Above 






s disease lymphoproliferative 










12 


41723_s_at 


syndrome 

major histocompatibility 
complex class II DR beta 1 


HLA-DRB1 


M32578 


189.25 


Below 



-47- 
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13 


38833_at 


Human mRNA for SB classll 




X00457 


189.03 


Below 






histocompatibility antigen 














alpha-chain 










1 A 


jjZjo at 


riuman i -lympnocyte specmc 


ICJv 




1 07.UJ 


/lUUVv 






protein tyrosine kinase p561ck 














(lck) abberant mRNA 










15 


37039_at 


major histocompatibility 


HLA-DRA 


J00194 


188.93 


Below 






complex class II DR alpha 










16 


3805 l_at 


mal T~cell differentiation protein MAL 


X76220 


188.93 


Above 


17 


37344_at 


major histocompatibility 


HLA-DMA 


X62744 


187.25 


Below 






complex class II DM alpha 










18 


38096_f_at 


major liistocompatibility 


HLA-DPB 1 


M83664 


182.38 


Below 






complex class II DP beta 1 










1 o 


zu3y_j>_at 


lymphocyte-specific protein 


LCK 


A/T2 ACQ 1 




Above 






tyrosine kinase 










20 


1105_s_at 


T cell receptor beta locus 


TRB 


M12886 


180.45 


Above 


21 


32649_at 


transcription factor 7 T-cell 


TCF7 


X59871 


177.84 


Above 






specific HMG-box 










22 


38949_at 


protein kinase C theta 


PRKCQ 


L01087 


172.59 


Below 


23 


39709_at 


selenoprotein W 1 


SEPW1 


U67171 


171.96 


Above 


Z4 


/111 rr of 

nil oz) g a i 


immunoglobulin heavy constant IGHM 


.AO I D\)l 


171 OA 


Below 




"3647^ at 


ubiquitin specific protease 20 


USP20 




167 77 


A.Dove 


26 


266 s at 


CD24 antigen small cell lung 


CD24 


L33930 


165.56 


Below 






carcinoma cluster 4 antigen 










27 


yl ACTA —a. 

40570_at 


forkhead box Ol A 


FOXOIA 


AF03 2 8 85 


165.29 


Below 






rhabdomyosarcoma 










Zo 


4U/ /j_at 


integral membrane protein 2A 


ITM2A 


AJJJZl /50 


1 £. A 1 A 

164.14 


Above 


zy 


^ /4zu__i_ai 


Human DNA sequence from 




A T A77771 
J\L,\)ZZ /Z3 


1 fiA 1 A 

104. 14 


Below 






clone RP3-377H14on 














chromosome 6p2 1.32-22.1. 










30 


1085_s_at 


phospholipase C gamma 2 


PLCG2 


M37238 


161.30 


Below 






phosphatidylinositol-specific 










31 


38018_g_at 


CD79A antigen 


CD79A 


U05259 


160.51 


Below 






immunoglobulin-associated 














alpha 










jZ 


^ D04-5_at 


nucleobindin 2 


NUCB2 


X/O/32 


loO.O/ 


Above 


33 


41166_at 


immunoglobulin heavy constant IGHM 


X58529 


158.50 


Below 


34 


38415_at 


LUU 

protein tyrosine phosphatase 


PTP4A2 


U14603 


155.78 


Above 






type IVA member 2 










35 


38893_at 


neutrophil cytosolic factor 4 


NCF4 


AL008637 


155.78 


Below 






40kD 










36 


1241_at 


protein tyrosine phosphatase 


PTP4A2 


U14603 


155.78 


Above 






type IVA member 2 










^7 


jz /y.5_at 


T cell receptor beta locus 


TRB 


XU0437 


155.43 


Above 


38 


3657 l_at 


topoisomerase DNA II beta 


TOP2B 


X68060 


152.16 


Below 






ISOkD 










39 


37399_at 


aldo-keto reductase family 1 


AKR1C3 


D 17793 


151.93 


Above 






member C3 3 -alpha 














hydroxysteroid dehydrogenase 














type II 










40 


41097_at 


telomeric repeat binding factor 2 TERF2 


AF002999 


151.86 


Below 
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Table 8. Genes selected by Chi square for TEL-AML1 



Affymetrix 
number 

1 38652_at 

2 36239 at 



Gene Name 



GeneSymbol 



3 
4 
5 
6 
7 



41442_at 
37780_at 
36985_at 
38578_at 
38203 at 



8 35614_at 

9 32224_at 

10 32730_at 

11 35665_at 

12 1077_at 

13 36524_at 

14 34194_at 

15 36937_s_at 
1*6 3~6008_at 

17 1299_at 

18 41814_at 

19 41200_at 

20 35238_at 

21 880_at 

22 33690 at 



23 40272_at 

24 35362_at 

25 41819_at 

26 40279_at 

27 1488 at 



CBFA2T3 
PCLO 
IDI1 

TNFRSF7 
KCNN1 

TCFL5 
KIAA0769 

PIK3C3 

RAG1 
ARHGEF4 



hypothetical protein FLJ20154 FLJ20154 
POU domain class 2 associating POU2AF1 
factor 1 

core-binding factor runt domain 
alpha subunit 2 translocated to 3 
piccolo presynaptic cytomatrix 
protein 

isopentenyl-diphosphate delta 
isomerase 

tumor necrosis factor receptor 
superfamily member 7 
potassium intermediate/small 
conductance calcium-activated 
channel subfamily N member 1 

transcription factor-like 5 basic 
helix-loop-helix 
KIAA0769 gene product 

Homo sapiens mRNA for 
KIAA1750 protein partial cds 

phosphomositide-3-kinase class 
3 

recombination activating gene 1 

Rho guanine nucleotide 
exchange factor GEF 4 
Homo sapiens cDNA FLJ21697 
fis clone COL09740 

PDZ and LEVI domain 1 elfin PDLIM1 
protein Tyrosine pnuspiiauisc r i j. t^j 
type IV A member 3 

telomeric repeat binding factor 2 TERF2 

fucosidase alpha-L- 1 tissue FUCA1 

CD36 antigen collagen type I CD36L1 
receptor thrombospondin 
receptor like 1 

TNF receptor-associated factor 5 TRAF5 

FK506-binding protein 1A 12kD FKBP1A 

Homo sapiens mRNA cDNA 
DKFZp434A202 from clone 
DKFZp434A202 

collapsin response mediator 
protein 1 
myosin X 

FYN-binding protein FYB- 
120/130 

KIAA0121 gene product 

protein tyrosine phosphatase 
receptor type K 



Reference 
number 

AF070644 
Z49194 



Chi 
square 
value 

137.92 

131.43 



Above/ 
Below 
Mean 

Above 

Above 



AB010419 


130.17 


Above 


AB011131 


126.79 


Above 


X17025 


125.47 


Above 


Mfi^928 


115.72 


Above 


U69883 


112.87 


Above 




108 45 


Above 


AB018312 


107.08 


Above 






AUUVC 




104.83 


Above 


M29474 


102.90 


Above 


AB029035 


100.67 


Above 


AL049313 


98.31 


Above 


U90878 


96.91 


Below 


AF04-1434 


96.68 


Above 


X93512 


93.08 


Above 


M29877 


92.77 


Above 


Z22555 


90.86 


Above 



AB000509 90.81 Above 
M34539 86.69 Above 
AL080190 86.69 Above 



CRMP1 


D78012 


85.44 


Above 


MYO10 


AB018342 


83.60 


Above 


FYB 


U93049 


83.25 


Above 


KIAA0121 


D50911 


81.66 


Above 


PTPRK 


L77886 


81.66 


Above 
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28 


1325_at 


MAD mothers against 
decapentaplegic Drosophila 


MADH1 


U59423 


SI. 17 


Above 




3 /yUo_at 


JllUlIlUiUg 1 

guanine nuLieuLiuc uuiumg 


GNG11 


U31384 


80.37 


Above 




T/CO c of 

/ OV_S__aI 


nrotein 1 1 


ANXA2 


D00017 


78.68 


Below 


31 


33415_at 


non-metastatic cells 2 protein 


NME2 


X58965 


77.04 


Below 






NM23B expressed in 






76.35 


Below 


32 


19S0_s_at 


non-metastatic cells 2 protein 










NM23B expressed in 








Above 


33 


32579_at 


SWI/SNF related matrix 
associated actin dependent 

lCgUlaUJI Ul L'UJL *JlAla till 






76.35 






subfamily a member 4 








Above 


34 


3y4ZD_ai 


L1ULOI CLHJA.U1 ICLlUC'lu.aC' 1 


TXNRD1 


X91247 


75.97 


35 


755_at 


inositol 1 4 5 -triphosphate 


TTDD 1 
1 1 rrvl 




75.56 


Above 






receptor type 1 






75.11 


Above 


36 


37343__at 


inositol 1 4 5 -triphosphate 


ITPR3 


U01062 






receptor type 3 






73.96 


Above 


37 


1336_s_at 


protein kinase C beta 1 


PRKCB1 


X06318 


38 


41097_at 


telomeric repeat binding factor 2 TERP2 


AF002999 


73.84 


Above 


39 


317S6_at 


Sam68-like phosphotyrosine 


T-STAR 


AF051321 


73.72 


Above 






protein T-STAR 








Above 


40 


160029_at 


protein kinase C beta 1 


PRKCB1 


X07109 


73.66 



2. Correlation-based Feature Selection (CFS) 
5 The Correlation-based Feature Selection (CFS) is a method that evaluates 

subsets of genes rather than individual genes. (Hall and Holmes 
(2000) > ,, Benchmarking Attribute Selection Techniques for Data Mining/' Working 
Paper 00/10, Department of Computer Science, University of Waikato, New Zealand). 
The core of the algorithm is a subset evaluation heuristic that takes into account the 

10 usefulness of individual features for predicting the class along with the level of 
intercorrelation among them with the belief that "good feature subsets contain 
features highly correlated with the class, yet uncorrected with each other". The 
heuristic assigns a score Merit s to a subset S containing k genes, defined as Merit s = 
(k* r cf )/sqrt(k + k * (k - 1) * r ff ), where r cf is the average gene-class correlation and r ff 

15 is the average gene-gene correlation. Like the Chi square method, CFS first 

discretizes the gene expressions into intervals and then calculates a matrix of gene- 
class and gene-gene correlations from the training data for merit calculation. The 
correlation between two genes or a gene and a class is calculated as r xy = 2 * [H(X) + 
H(Y) - H(X,Y)]/[H(X) + H(Y)], where H(X) is the entropy of a gene X. CFS starts 
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from an empty set of genes and uses the best-first search technique with a stopping 
criterion of 5 consecutive folly expanded non-improving subsets. The subset with the 
highest merit found during the search is selected. Tables 9-15 list the top gene subsets 
chosen by CFS for each subtype. For subtype prediction, each gene subset must be 
used in its entirety, as within each subset, all genes are equally ranked. 



Affymetrix 
number 



36650_at 
40196_at 
1635 at 



4 33775_s_at 

5 1636_g_at 

6 41295_at 

7 1326_at 

8 33150_at 

9 4005 l_at 

10 39061_at 

11 33172_at 

12 37399_at 

13 317_at 

14 330_s__at 

15 38578_at 

16 39044_s_at 

17 32562_at 

18 38641_at 

19 1211_s_at 

20 39730_at 

21 36591_at 

22 36035 at 



Table 9. Genes selected by CFS: BCR-ABL 
Gene Name 



cyclin D2 
HYA22 protein 

proto-oncogene tyrosine-protein 
kinase (ABL) gene 

caspase 8 apoptosis-related cysteine 
protease 

proto-oncogene tyrosine-protein 
kinase (ABL) gene 

GTT1 protein 

caspase 10 apoptosis-related cysteine 
protease 

disrupter of silencing 10 
TRAM-like protein 
bone marrow stromal cell antigen 2 
hypothetical protein FLJ10849 
aido-keio reductase family 1 member 
C3 3-alpha hydroxysteroid 
dehydrogenase type n 
protease cysteine 1 legumain 

tubulin, alpha 1, isoform44 



tumor necrosis factor receptor 
superfamily member 7 

diacylglycerol kinase delta 130kD 
endoglin Osler-Rendu- Weber 
syndrome 1 

Homo sapiens mRNA for TSC-22- 
like protein 

CASP2 and RIPK1 domain containing CRADD 
adaptor with death domain 

v-abl Abelson murine leukemia viral ABL 1 
oncogene homolog 1 

tubulin alpha 1 testis specific TUBA1 

anchor attachment protein 1 Gaalp GPAA1 
yeast homolog 



GeneSymbol 


Reference 
number 


Above/ 
Below 
Mean 


CCND2 


D13639 


Above 


HYA22 


D88153 


Above 


ABL 


U07563 


Above 


CASP8 


X98176 


Above 


ABL 


U07563 


Above 


v_J X X 1 


AL041780 


Above 


CASP10 


U60519 


Above 


SAS10 


AI126004 


Above 


KIAA0057 


D31762 


Above 


BST2 


D28137 


Above 


FLJ10849 


T75292 


Above 


AKR-1C3 


D 17793 


Above 


PRSC1 


D55696 


Above 


TUBA1 
TNFRSF7 


HG2259- 

HT2348 

M63928 


Above 
Above 


DGKD 


D73409 


Below 


ENG 


X72012 


Above 



AJ133115 

U84388 

X16416 

X06956 
AB002135 



Above 

Above 

Above 

Above 
Above 
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23 


980_at 


Nieniann-Pick disease type CI 


IN r l~.l 






24 


4069S_ 


.at 


C-type calciuni dependent 


CLECSF2 


X96719 


Above 








caruonyGxaie-recoguiiioii uuiijaiii 














lectin superfamily member 2 














activation-induced 








25 


39330_ 


s_at 


actinin alpha 1 


ACTN1 


TV C 1 1 O 

M95178 


Above 


26 


2001_g_at 


ataxia telangiectasia mutated includes 


ATM 


U26455 


Above 








complementation groups A C and D 








Z / 


393 19_ 


_ at 


lympnocyre cyxosoiic protein jl oxizr 


T TP'' 


TT20158 

W 1 JO 


Above 








domain-containing leukocyte protein 














of76kD 








28 


376S5 


at 


Clathrin assembly lymphoid-myeloid 


CLTH 


U45976 


Above 








leukemia pene 








29 


33S13_ 


at 


tumor necrosis factor receptor 


TNFRSFIB 


A TO ncii 

A1813532 


Above 








sunerfaniilv member IB 








30 


33134 


at 


adenylate cyclase 3 


ADCY3 


AB011083 


Above 


31 


36536_ 


at 


schwannornin interacting protein 1 


SCHIP-l 


AF070614 


Above 




36985_ 


at 


lbopciiiciiyi-uipiiubpiiaic iXClla 


TDTl 


X17025 


Below 








isomerase 








33 




cLl 


Sm protein F 


LSM6 


AA9 17945 


Above 


34 


33774_ 


at 


caspase 8 apoptosis-related cysteine 


CASP8 


X98172 


Above 








protease 








35 


37470 


at 


leukocyte-associated Ig-like receptor 
i 


LAIR1 


AF013249 


Above 


36 


39245 


at 


1 

Human 40871 mRNA partial 




U72507 


Above 








sequence 








37 


40076_ 


at 


tumor protein D52-like 2 


TPD52L2 


AF004430 


Below 


38 


39370_ 




Microtubule-associa ted proteins 1A 


MAP1ALC3 


W28807 


Below 








and IB light chain 3 










41594_ 


at 


J anus Kinase i a pioiein lyruMiie 




M 64 1 74 


Above 








kinase 










41338_ 


at 


amino-icrmjiiai ennancer 01 spin 




AT969192 


<=»1 c\wi 
JJCIU vv 


41 


323 19_ 


at 


tumor necrosis factor ligand 


rrr\ rrirt-r* /* 

TNFSF4 


AL022310 


A 1_ 

Above 








superiamny rnciiiDer h id.x- 














transcriptionally activated 














glycoprotein 1 34kD 








42 


33924_ 


at 


KIAA1091 protein 


t^"T a a "i r\r\ 1 

KIAA1091 


AB029014 


Above 


43 


37397 _ 


at 


platelet/endothelial cell adhesion 


PECAM 


L34657 


Above 








moiecuic- 1 ^irnv-'/vivi- 1 ^ gene 








44 


37190_ 


_at 


WAS protein family member 1 


WASFl 


D87459 


Below 


45 


39070_ 


.at 


singed Drosophila like sea urchin 


SNL 


U03057 


Above 








fascin homolog like 








46 


38994_ 


_ at 


STAT induced STAT inhibitor-2 


STATI2 


AF037989 


Above 


47 


32621_ 


.at 


down-regulator of transcription 1 


DRl 


M97388 


Above 








1 J3x -DlllUlUg negative OUldOLUl »-» 








48 


40108_ 


at 


KIAA0005 gene product 


KIAA0005 


D13630 


Below 


49 


35238_ 




TNF receptor-associated factor 5 


TRAF5 


AB000509 


Above 


50 


1558_g_at 


p21/Cdc42/Racl -activated kinase 1 


PAKl 


U24152 


Above 



yeast Ste20-related 
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transcription factor 3 E2A TCF3 
immunoglobulin enhancer binding 
factors E12/E47 

integrin alpha 4 antigen CD49D alpha ITGA4 
4 subunit of VLA-4 receptor 

suppressor of clear C. elegans SHOC2 
homolog of 



M31523 



X16983 



AB020669 



Below 



Above 



Below 



Affymetrix 
number 

1 33355 at 



Table 10. Gene selected by CFS for E2A-PBX1 

Gene Name GeneSymbol Reference 

number 



Homo sapiens cDNA FLJ12900 fis 
clone NT2RP2004321 (by CELERA 
search of target sequence = PBX1) 



PBX1 



AL049381 



Above/ 
Below 
Mean 

Above 



Affymetrix 
number 



Table 11. Genes selected by CFS for: Hyperdiploid >50 
Gene Name GeneSymbol Reference 

number 



Above/ 
Below 
Mean 



1 


36620_at 


superoxide dismutase 1 soluble 


SOD1 


X02317 


Above 






amyotrophic lateral sclerosis 1 adult 








2 


37350_at 


clone 8S9N15 on chromosome 


PSMD10 


AL031177 


Above 






Xq22.1-22.3. Contains part of the 












gene for a novel protein similar to X. 












laevis Cortical Thymocyte Marker 












CTX 








3 


41724_at 


accessory proteins BAP31/BAP29 


DXS1357E 


X81109 


Above 


4 


38738_at 


SMT3 suppressor of rnif two 3 yeast 


SMT3H1 


X99584 


Above 






homolog 1 








5 


40480_s_at 


FYN oncogene related to SRC FGR 


FYN 


M14333 


Above 






YES 






Above 


6 


38518_at 


sex comb on midleg Drosophila like 2 SCML2 


Y18004 


7 


31492_at 


muscle specific gene 


M9 


AB019392 


Below 


8 


35688_g_at 


mature T-cell proliferation 1 


MTCP1 


Z24459 


Above 


9 


35939_s_at 


POU domain class 4 transcription 


POU4F1 


L20433 


Above 






factor 1 






Above 


10 


36128_at 


transmembrane trafficking protein 


TMP21 


L40397 


11 


37014_at 


myxo virus influenza resistance 1 


MX1 


M33882 


Above 






homolog of murine interferon- 












inducible protein p78 






Above 


12 


34374_g_at 


upstream regulatory element binding 


UREB1 


Z97054 






protein 1 






Above 


13 


688_at 


proteasome prosome macropain 26S 


PSMC1 


L02426 






subunit ATPase 1 








14 


39S78_at 


protocadherin 9 


PCDH9 


AI524125 


Below 


15 


38771_at 


histone deacetylase 1 


HDAC1 


D50405 


Below 
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16 


S65_at 


ribosomal protein S6 kinase 90kD 


-r>» -p» o /•T / r jl o 

RPS6KA3 


T 1 A 
Ul/CO lO 


/vuuvc 






polypeptide 3 








17 


41143_at 


calmodulin (CALM1) gene 


CALM1 


U12022 


Above 


18 


39S67_at 


Tu translation elongation factor 


TUFM 


S75463 


Below 






mitochondrial 






Above 


19 


41470_at 


prominin mouse like 1 


PROML1 


AF027208 


20 


41503_at 


KIAA0854 protein 


KIAA0854 


AB020661 


Below 


21 


2039„s_at 


FYN oncogene related to SRC FGR 


FYN 


M14333 


Above 












Above 


22 


36845_at 


KIAA0136 protein 


KIAA0136 


D50926 


23 


36940_at 


TGFBl-induced anti-apoptotic factor 


TIAF1 


D86970 


Above 


24 


32236_at 


1 

ubiqmtin-conjugatmg enzyme E2G 2 






Above 






lmmnlopous to veast TJBC7 








25 


36885_at 


spleen tyrosine kinase 


o Y Jv 




Below 


26 


40200_at 


heat shock transcription factor 1 


HSF1 


A Jf/ZA CHI 


r>eiow 


27 


40S42_at 


Ul snRNP-specific protein A gene 


SNRPA 


M60784 


Below 


28 


40514_at 


hypothetical 43.2 Kd protein 


LOC51614 


AF091085 


Below 


29 


41222_at 


signal transducer and activator of 


STAT6 


AF067575 


Below 






transcription o (p l A l o; gene 








30 


1294_at 


ubiquitin-activating enzyme El -like 


UBE1L 


L13852 


Below 


31 


34315_at 


AFG3 ATPase family gene 3 yeast 


AFG3L2 


Y18314 


Above 






like z 






Above 


32 


39806_at 


DKFZP547E21 10 protein 


DKFZP547E21 AL050261 
in 


33 


40875_s_at 


small nuclear ribonucleoprotein 70kD 


SNRP70 


X06815 


DtlU W 






r»nlvnf*nHde RNP antiffen 








34 


38458_at 


cytochrome b5 (CYB5) gene 


CYB5 


L39945 


Above 


35 


1817_at 


prefoldin 5 


PFDN5 


D89667 


Below 


36 


34709_r_at 


stromal antigen 2 


STAG2 


Z75331 


AUUVC 


37 


33447_at 


myosin light polypeptide regulatory 


MLCB 


X54304 


Above 






non-sarcomenc 








38 


1077_at 


recombination activating gene 1 


RAG1 


M29474 


[)tH) w 


39 


1915_s_at 


v-fos FB J murine osteosarcoma viral 


FOS 


V01512 


Above 






oncogene nomoiog 








40 


38854_at 


K1AA0635 gene product 


KIAA0635 


AB014535 


Above 


41 


37732_at 


RING1 and YY1 binding protein 


RYBP 


AL049940 


Above 


42 


35940_at 


JrOU domain class *f rranscnpuon 


POU4F1 


X64624 


Above 






iacior i 






Below 


43 


34 /33_at 


splicing tactor oa suDunu i iz,uklj 


SF3A1 


X85237 


44 


245_at 


selectin L lymphocyte adhesion 


SELL 


M25280 


Jtseiow 






molecule 1 






Below 


45 


40146_at 


RAP IB member of RAS oncogene 


RAP1B 


AL080212 






family 




D63780 


Below 


46 


40104_at 


serme/trireonine kinase 25 Ste20 yeast STK25 






homolog 








47 


430_at 


nucleoside phosphorylase 


NP 


X00737 


Above 
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48 


36899_at 


special AT-rich sequence binding 
protein 1 binds to nuclear 
matrix/scaffold-associating DNA s 


SATB1 


M97287 


Below 


49 


35727_at 


hypothetical protein FLJ205 17 


FLJ20517 


AI249721 


Below 


50 


38649_at 


KIAA0970 protein 


ttt A A f\C\Hf\ 


ADAOl 1 Q7 
AJtSUZ j> lo / 


Below 




36107 at 


ATP synthase H transporting 
mitochondrial F0 complex subunit F6 


ATP5J 


AA845575 


Above 


52 


38789_at 


transketolase Wernicke-Korsakoff 


TKT 


L12711 


Below 


53 


39301_at 


syndrome 
caipam o p^*+ 


CAPN3 


X85030 


Below 


54 


41278 at 




BAF53A 


AF041474 


Below 


55 


41162_at 


protein phosphatase 1G formerly 2C 
magnesium-dependent gamma 




Y13936 


Below 






iso form 






Below 


56 


37819_at 


hypothetical protein 


LOC54104 


AF007130 


57 


38717_at 


DKFZP586A0522 protein 


DKFZP586 


AL050159 


Below 




A0522 




Above 


58 


40019_at 


ecotropic viral integration site 2B 


EVI2B 


M60830 


59 


39489_g_at 


protocadherin 9 


PCDH9 


W2 / 1 1\j 


Below 


60 


857_at 


protein phosphatase 1 A formerly 2C 
magnesium-dependent alpha isoform 


PPM1A 


oo / / jV 


AUU V 


61 


32804_at 


RNA binding motif protein 5 


RBM5 


AF091263 


Below 


62 


37676_at 


phosphodiesterase 8A 


PDE8A 


AF056490 


Below 


63 


1519 at 


v-ets avian erythroblastosis virus E26 
oncogene homolog 2 


ETS2 


J04102 


Above 


64 


37680 at 


A kinase PRKA anchor protein gravin AKAP12 


U81607 


Below 


65 


548 s at 


12 

spleen tyrosine kinase 


SYK 


S80267 


Below 


66 


39797_at 


KIAA0349 protein 


KIAA0349 


AB002347 


Above 


OA 




jiu^iwai ~ wtip -i^xlxv*l-i.j.^-'j>j. v/ • — 


NCBP2 




DC1UW 






20kD 






Below 


68 


38091_at 


lectin galactoside-binding soluble 9 


LGALS9 


Z49107 


69 


41223_at 


galectin 9 

cytochrome c oxidase subunit Va 


COX5A 


M22760 


Below 


70 


933_f_ at 


zinc finger protein 91 HPF7 HTF10 


ZNF91 


T 1 1 AiT) 


13 CIvJ W 


71 


37012_at 


capping protein actin filament muscle CAPZB 


U03271 


Below 






Z-line beta 








72 


35214_at 


UDP-glucose dehydrogenase 


UGDH 


x AFOolOlo 


Above 


73 


32434__at 


myristoylated alanine-rich protein 
kinase C substrate MARCKS 80K-L 


MACS 


' D 10522 


Above 


74 


38345_at 


centrosomal protein 1 


CEP1 


ArUoiizz 


13 c low 


75 


40404_s_at 


CDC 16 cell division cycle 16 S. 
cerevisiae homolog 


CDC16 


U18291 


Below 


76 


39096_at 


SON DNA binding protein 


SON 


AB028942 


Above 


77 


33429_at 


DKFZP586M1523 protein 


DKFZP586M1 AL050225 


Above 


78 


40641_at 


TBP-associated factor 172 


TAF-172 


AF038362 


Above 


79 


41381_at 


KIAA0308 protein 


KIAA0308 


AB002306 


Below 



-55- 



BNyUUUlU! <fVVU 



UJUBJI4UA^_LS 



WO 03/083140 



PCT/US03/0H486 



80 35135 at 



81 3942 l_at 

82 195_s_at 

83 3689S_r_at 

84 38792_at 

85 32643 at 



86 38808_at 

87 36062_at 

88 300_f_at 

89 1979„s_at 

90 32230_at 

91 39893_at 

92 34651__at 

93 1052_s_at 

94 36272_r_at 

95 2044_s_at 

96 32135 at 



Homo sapiens Similar to CGI 5084 
gene product clone MGC 10471 
mRNA complete cds 

runt-related transcription factor 1 RUNX1 

acute myeloid leukemia 1 amll 

oncogene 

caspase 4 apoptosis-related cysteine CASP4 
protease 

primase polypeptide 2A 5 SkD PRIM2 A 

speirnine synthase SMS 
glucan 1 4-alpha- branching enzyme 1 GBE1 
glycogen brandling enzyme Andersen 
disease glycogen storage disease type 
IV 

cell membrane glycoprotein 110000M GP110 
r surface antigen 

Leupaxin LPXN 
transcription factor BTF3 homolog 
(GB:M90355) 

nucleolar protein 1 120kD NOLI 
eukaryotic translation initiation factor EEF3S2 
3 subunit 2 beta 36kD 

guanine nucleotide binding protein G GNG7 
protein gamma 7 

catechol-O-methyltransferase COMT 
CCAAT/enhancer binding protein CEBPD 
C/EBP delta 

peripheral myelin protein 2 PMP2 

retinoblastoma 1 including RBI 
osteosarcoma 

sterol regulatory element binding SREBF 1 
transcription factor 1 



X13956 



D43969 

U28014 

X74331 

AD001528 

L07956 



D64154 

AF062075 

HG4518- 
HT4921 

X55504 
U39067 

AB010414 

M58525 
M83667 

X62167 
Ml 5400 

U00968 



Below 



Below 

Below 

Above 
Above 
Below 



Below 

Below 
Below 

Below 
Below 

Below 

Above 
Below 

Below 
Below 

Below 



Affymetrix 
number 

1 34306_at 

2 40797_at 

3 33412_at 

4 39338_at 

5 2062_at 

6 32193_at 

7 40518 at 



Table 12. Genes selected by CFS for MLL 
Gene Name GeneSymbol Reference 

number 



muscleblind Drosophila like MBNL 
a disintegrin and metalloproteinase ADAM 1 0 
domain 10 

LGALS1 Lectin, galactoside-binding, LGALS1 
soluble, 1 (galectin 1) 

S100 calcium-binding protein A10 S100A10 
annexin II ligand calpactin I light 
polypeptide pll 

insulin-like growth factor binding IGFBP7 
protein 7 

plexin CI PLXNC1 
protein tyrosine phosphatase receptor PTPRC 

-56- 



AB007888 
AF009615 

AI535946 
AI201310 

L19182 

AF030339 
Y00062 



Above/ 
Below 
Mean 

Above 

Above 
Above 

Above 

Above 

Above 
Above 



BNSDOCID: <WO_ 



03083 140A2_I_> 



WO 03/083140 



PCT/US03/08486 



type C 

DNA segment on chromosome 12 
unique 2489 expressed sequence 

capping protein actin filament 
gelsolin-like 
Meisl mouse homolog 

FK506-binding protein 5 
homeo box A9 
K1AA0878 protein 
lymphocyte antigen 75 
membrane metallo-endopeptidase 
neutral endopeptidase enkephalinase 
CALL A CD 10 

deoxynucleotidyltransferase terminal 
glutamate-ammonia ligase glutamine 
synthase 

B lymphoid tyrosine kinase 

E74-like factor 1 ets domain 
transcription factor 

X-box binding protein 1 
Testing 

Homo sapiens mRNA cDNA 
DKFZp586C1019 from clone 
DKFZp586C1019 

acetyl-Coenzyme A acyltransferase 1 
peroxisomal 3-oxoacyl-Coenzyme A 
thiolase 

protein tyrosine phosphatase receptor 
type C 

cyclin-dependent kinase inhibitor IB 
p27 Kipl 

pre-B-cell leukemia transcription 
factor 3 

KIAA0098 protein 
eukaryotic translation initiation factor 
3 subunit 2 beta 36kD 

peptidylprolyl isomerase D 
cyclophilin D 

putative DNA binding protein 
myocyte-specific enhancer factor 2A 
(MEF2A) gene 

hypothetical protein 
hematopoietic cell-specific Lyn 
substrate 1 

serine or cysteine proteinase inhibitor 
clade B ovalbumin member 1 

3 5 40520_g_at protein tyrosine phosphatase receptor 
type C 



8 36777_at 

9 38391_at 

10 40763_at 

11 34721_at 

12 37S09_at 

13 32215__i_at 

14 38160_at 

15 1389_at 

16 3416S_at 

17 40522__at 

18 854_at 

19 40067_at 

20 39756_g_at 

21 32134_at 

22 39379 at 



23 40415_at 

24 40519_at 

25 33847_s_at 

26 32696_at 

27 40417_at 

28 1644_at 

29 948_s_at 

30 34337_s_at 

31 41747_s_at 

32 39516_at 

33 31S20_at 

34 33305 at 



D12S2489E 


AJ0016S7 


Above 


CAPG 


M94345 


Above 


MEIS1 


U85707 


Above 


FKBP5 


U42031 


Above 


HOXA9 


U41813 


Above 


KIAA0878 


AB020685 


Above 


LY75 


AF011333 


Above 


MME 


J03779 


Below 


DNTT 


MH722 


Below 


GLUL 


X59834 


Above 


BLK 


S76617 


Above 


ELF1 


M82882 


Above 


XBP1 


Z93930 


Below 


DKFZP586 


AL050162 


Above 


B2022 








AL049397 


Above 



ACAAl 

PTPRC 

GDKNIB 

PBX3 

EIF3S2 

PPID 

M96 
MEF2A 

HSPC004 
HCLSl 

SERPINBl 
PTPRC 



X14813 

Y00638 
U10906 

X59841 

D43950 
U36764 

D63861 

AJ010014 
U49020 

AI827793 
XI 6663 

M93056 
Y00638 



Above 

Above 
Above 

Above 

Above 
Above 

Above 

Below 
Above 

Above 
Above 

Above 
Above 



-57- 



WO 03/083140 



PCT/US03/08486 



36 


41222_at 


signal transducer and activator of 


STAT6 


AF067575 


Above 






transcriotion 6 CSTAT6} gene 








37 


1718_at 


actin related protein 2/3 complex 


AKJr 




Above 






subunit 2 34 kD 








38 


38342_at 


KIAA0239 protein 


KIAA0239 


D87076 


Below 


39 


38805_at 


TG-mteracting factor TALE family 


TGIF 


X89750 


Below 






homeobox 






Above 


40 


32089_at 


sperm associated antigen 6 


SPAGo 


ArU /yjoi 


41 


1950_s_at 


Smad 3, exon 1 






Above 


42 


39410_at 


development and differentiation 




AR007860 


Above 






enhancing factor 2 








43 


37^80 at 

— ' / i-UV ext. 


MAD mothers against 


MADH1 


U59912 


Below 






decapentaplegic Drosophila homolog 








44 


32607_at 


1 

brain acid-soluble protein 1 


BASP1 


AF039656 


Above 


H J 




CT)9 antieen t)24 


CD9 


M38690 


Below 


46 


40913_at 


ATPase Ca transporting plasma 


ATP2B4 




Below 






membrane 4 






Below 


47 


1039_s_at 


hypoxia-inducible factor 1 alpha 


HIF1A 


U22431 






subunit basic helix-loop-helix 












transcription factor 






Below 


48 


35939_s_at 


POU domain class 4 transcription 


POU4F1 


L20433 






factor 1 






Below 


49 


963_at 


ligase IV DNA ATP-dependent 


LIG4 


X83441 


50 


39628_at 


RAB9 member RAS oncogene family RAB9 


U44103 


Below 




38242 at 

JO^t^ til 


B cell linker protein 


SLP65 


AF068180 


Below 


52 


37692_at 


diazepam binding inhibitor GAB A 


DBI 


AI557240 


Above 






receptor modulator acyl-Coenzyme A 












binding protein 








53 


32166_at 


KIAA1027 protein 


KIAA1027 


AB028950 


Above 


54 


34800_at 


DKPZP586Q1624 protein 


DKFZP586016 AL039458 


Below 


55 


34386_at 


methyl-CpG binding domain protein 4 MBD4 


AF072250 


DC1UW 


56 


40296_at 


hypothetical protein 


753P9 


AL023653 


Below 


57 


40456_at 


up-regulated by BCG-CWS 


LOC64116 


AL049963 


Above 


JO 




ferritin heavy polypeptide 1 


FTH1 


L20941 


Below 


59 


39049_at 


GIS.la and GIS.lb proteins (G18.1a 




AJ243937 


Below 






and GIS.lb genes, located in the class 












III region of the major 












histocompatibility complex) 






Above 


60 


38075_at 


synaptophysin-like protein 


SYPL 


X68194 


61 


932_i_at 


zinc finger protein 91 HPF7 HTF10 


ZNF91 


LI 1672 


Below 


62 


1825_at 


IQ motif containing GTPase 


IQGAP1 


L33075 


Above 






activating protein 1 








63 


34210_at 


CDW52 antigen CAMPATH-1 


CDW52 


N90866 


Below 






antigen 






Below 


64 


3977S_at 


mannosyl alpha- 1 3- glycoprotein 


MGAT1 


M55621 






beta-1 2-N- 












acetylglucosarninyltransferase 






Below 


65 


34699_at 


CD2-associated protein 


CD2AP 


AL050105 



-58- 



BNSDOCID: <WO 03083 140A2_I_> 



WO 03/083140 



PCT/US03/08486 



66 40066_at 

67 41177_at 

68 32736_at 

69 1928_s_at 

70 1081_at 

71 37345_at 

72 34099_fat 

73 933_f_at 

74 32214_at 

75 33501_r_at 

76 950_at 

77 41161_at 

78 41381_at 

79 38705_at 

80 38617_at 

81 34305_at 

82 40436_g_at 

83 1827_s_at 

84 38479_at 

85 33207_at 

86 39039_s_at 

87 32157_at 

88 905_at 

89 35794_at 

90 1007_s_at 

91 39424__at 

92 36634_at 

93 38760_f_at 



ubiquitin-activating enzyme E1C 
homologous to yeast UBA3 

hypothetical protein FLJ 12443 
HSPC022 protein 
mad protein homolog Smad2 gene 
ornithine decarboxylase 1 
Calumenin 

nucleosome assembly protein 1-like 1 

zinc finger protein 91 HPF7 HTF10 

thioredoxin-like 32kD 

SNC73 protein SNC73 mRNA 
complete cds 
translocation protein 1 

death-associated protein 6 
KIAA0308 protein 
ubiquinn-conjugating enzyme E2D 2 
homologous to yeast UBC4/5 

LIM domain kinase 2 
poly rC binding protein 1 
solute carrier family 25 mitochondrial 
carrier adenine nucleotide translocator 
member 6 

c-myc-P64 mRNA, initiating from 
promoter P0 

acidic protein rich in leucines 
DnaJ Hsp40 homolog subfamily C 
member 3 
CGI-76 protein 

-protein.phosphatase 1 catalytic 
subunit alpha isoform 

guanylate kinase 1 

KIAA0942 protein 

discoidin domain receptor family 
member 1 

tumor necrosis factor receptor 
superfamily member 14 herpesvirus 
entry mediator 
BTG family member 2 

butyrophilin subfamily 3 member A2 



UBE1C 


AF046024 


Above 


FLJ12443 


AW024285 


Above 


HSPC022 


W68830 


Above 


Smad2 


U78733 


Below 


ODC1 


M33764 


Above 


CALU 


AF0 13759 


Above 


KTAP1T 1 


W260*>6 


Above 


ZNF91 


LI 1672 


Below 


TXNL 


AF003938 


Below 




S71043 


Below 


TLOC1 


D87127 


Below 


DAXX 


AB015051 


Below 


KIAA0308 


AB002306 


Below 


UBE2D2 


AI3 10002 


Above 


LIMK2 


D45906 


Below 


PCBP1 


Z29505 


Above 


SLC25A6 


J03592 


Above 



SSP29 
DNAJC3 

LOC51632 
PPP1CA 

GUK1 

KIAA0942 

DDR1 



M13929 

Y07969 
AI095508 

AI557497 
S57501 

L76200 

AB023159 

U48705 



TNFRSF14 U70321 



BTG2 
BTN3A2 



U72649 
U90546 



Above 

Below 
Below 

Below 
Above 

Below 
Below 
Below 

Below 

Below 
Below 



Affymetrix 
number 

1 37960_at 

2 31892_at 



Table 13. Genes selected by CFS for Novel Class 

Gene Name GeneSymbol Reference 

number 



carbohydrate chondroitin 6/keratan 
sulfotransferase 2 



CHST2 



protein tyrosine phosphatase receptor PTPRM 
type M 



ABO 14679 
X58288 



Above/ 
Below 
Mean 

Above 

Above 



-59- 



ywyuojiu: <wu 



UJUBJ I mjA^_l_!> 



WO 03/083140 



PCT/US03/08486 



-1 

D 


a I 


protein tyrosine phosphatase receptor 


rlrKM 


veco o o 

X582S8 


Above 






type M 








4 


995_g_at 


protein tyrosine phosphatase receptor 


PTPRM 


X58288 


Above 






type M 








5 


41074_at 


G protein-coupled receptor 49 


GPR49 


AF062006 


Above 


6 


41073_at 


G protein-coupled receptor 49 


GPR49 


AI743745 


Above 


7 


34676_at 


KIAA1099 protein 


KIAA1099 


AB029022 


Above 


8 


36139_at 


DKFZP5S6G0522 protein 


DKFZP586G05 


AL050289 


Above 


9 


37542_at 


lipoma HMGIC fusion partner-like 2 


22 

LHFPL2 


D86961 


Above 


10 


41159_at 


clatlirin heavy polypeptide He 


CLTC 


D21260 


Above 


11 


32800_at 


retinoid X receptor alpha mRNA 




U66306 


Above 


12 


1664_at 


insulin-like growth factor 2 


IGF2 


HG3543- 


Above 










HT3739 




13 


36566_at 


cystinosis nephropathic 


CTNS 


AJ222967 


Above 



Table 14, Gene selected by CFS for T-ALL 
Affymetrix Gene Name GeneSymbol Reference Above/ 

number number Below 

Mean 

1 38319__at CD3D antigen delta polypeptide TiT3 CD3D AA919102 Above 

complex 



Affymetrix 
number 

1 38652_at 

2 36239_at 

3 41442_at 

4 37780_at 

5 36985_at 

6 38578__at 

7 35614_at 

8 32224_at 

9 32730_at 

10 36937_s_at 

11 36008_at 

12 41200 at 



Table 15. Genes selected by CFS for TEL-AML1 

Gene Name GeneSymbol Reference 

number 

hypothetical protein FLJ20 1 54 FLJ20 1 54 AF070644 

POU domain class 2 associating POU2AF1 Z49 1 94 
factor 1 

core-binding factor runt domain alpha CBFA2T3 ABO 1 04 1 9 
subunit 2 translocated to 3 

piccolo presynaptic cytomatrix PCLO ABO 11131 
protein 

isopentenyl-diphosphate delta EDI 1 X 1 7025 
isomerase 

tumor necrosis factor receptor TNFRSF7 M63928 
superfamily member 7 

transcription factor-like 5 basic helix- TCFL5 ABO 1 2 1 24 
loop-helix 

KIAA0769 gene product KIAA0769 AB018312 

KIAA1750 protein AL080059 

PDZ and LIM domain 1 elfm PDLIM1 U90878 

protein tyrosine phosphatase type IVA PTP4A3 AF041434 
member 3 



CD36 antigen collagen type I receptor CD36L1 
thrombospondin receptor like 1 



Z22555 



Above/ 
Below 
Mean 

Above 

Above 
Above 

Above 
Above 
Above 

Above 

Above 
Above 
Below 
Above 

Above 



-60- 



BNSDOCID: <WO 03083 140A2_I_> 



WO 03/083140 



PCT/US03/08486 



13 


33690_at 


DKFZp434A202 from clone 




AL080190 


Above 






DKFZp434A202 








14 


755_at 


inositol 1 4 5-triphosphate receptor 


ITPR1 


D26070 


Above 






type 1 






Above 


15 


41097_at 


telomeric repeat binding factor 2 


TERF2 


AF002999 


16 


160029_at 


protein kinase C beta 1 


PRKCB1 


X07109 


Above 


17 


3448 l_at 


vav proto-oncogene 


Vav 


AF030227 


Above 


18 


41498_at 


KIAA0911 protein 


KIAA0911 


AB020718 


Above 


19 


37280_at 


MAD mothers against 


MADH1 


U59912 


Above 



20 I647_at 

21 37724_at 

22 3798 l_at 

23 37326_at 

24 37344_at 

25 38666_at 

26 39039_s_at 

27 34819_at 

28 40729_s_at 

29 34224_at 

30 39827_at 

31 32157 at 



decapentaplegic Drosophila homolog 
1 

IQ motif containing GTPase 
activating protein 2 

v-myc avian myelocytomatosis viral 
oncogene homolog 

drebrin 1 

proteolipid protein 2 colonic 
epithelium-enriched 

major histocompatibility complex 
class II DM alpha 

pleckstrin homology Sec7 and 
coiled/coil domains 1 cytohesin 1 

CGI-76 protein 

CD 164 antigen sialomucin 

nuclear factor of kappa light 
polypeptide gene enhancer in B-cells 
inhibitor-like 1 
fatty acid desaturase 3 

hypothetical protein 

protein phosphatase 1 catalytic 
subunit alpha isoform 



32 34183_at DKFZP434C171 protein 



33 39329_at 

34 38124_at 

35 33304_at 

36 41295_at 

37 40745_at 

38 38906_at 

39 263_g_at 

40 41609 at 



actinin alpha 1 

midkine neurite growth-promoting 
factor 2 

interferon stimulated gene 20kD 
GTT1 protein 

adaptor-related protein complex 1 
beta 1 subunit 

spectrin alpha erythrocytic 1 
elliptocytosis 2 

S-adenosylmethioiiine decarboxylase 
1 

major histocompatibility complex 
class II DM beta 



IQGAP2 

MYC 

DBN1 
PLP2 



U51903 

V00568 

U00802 
U93305 



HLA-DMA X62744 



PSCD1 

LOC51632 

CD164 

NFKBIL1 

FADS3 

FLJ20500 

PPP1CA 



M85169 

AI557497 

D14043 

Y14768 

AC004770 
AA522530 
S57501 



DKFZP434C17 AL080169 
1 



4 1 39045_at hypothetical protein FLJ2 1432 



ACTN1 
MDK 

ISG20 
GTT1 
AP1B1 

SPTA1 
AMD1 
HLA-DMB 

FLJ21432 



X15804 
X55110 

U88964 

AL041780 

L13939 

M61S77 
M21154 
U15085 

W26655 



Below 

Below 

Above 
Below 

Above 

Below 

Below 
Below 
Above 

Above 
Below 
Below 

Below 

Below 
Above 

Above 
Below 
Above 

Above 
Below 
Above 

Below 



-61- 



yNyuuuiui <wu ujubjuua^j.!. 



WO 03/083140 



PCT/US03/0K486 



42 


3942 lat 


runt-related transcription factor 1 


RUNX1 


D43969 


Above 






acute myeloid leukemia 1 ami 1 




















43 


34210_at 


CDW52 antigen CAMP ATH- 1 


CDW52 


N90S66 


Above 






antigen 








44 


37276_at 


IQ motif containing GTPase 


IQGAP2 


U51903 


Below 






activating nrntpin 7 








45 


38763_at 


L-iditol-2 dehydrogenase gene 




L29254 


Below 


46 


40960_at 


UDP-Gal betaGlcNAc beta 1 4- 


B4GALT1 


D29805 


Below 






galactosyltransferase polypeptide 1 








47 


1127_at 


ribosomal protein S6 kinase 90kD 


RPS6KA1 


L07597 


Below 






Tinl'\m**'rvt"irl*» 1 

puiypcpiiue 1 










j / jD? at 


JviAAUiuz gene product 


KIAA0102 


D14658 


Below 


AQ 




orlJ-aomam binding protem 5 B 1K- 


OTT1 T"» T» C 

SH3BP5 


AB005047 


Below 






A Q Q A P i a tf=» rl 








50 


39135_at 


KIAA0767 protein 


KIAA0767 


AB018310 


Below 


51 


36128_at 


transmembrane trafficking protein 


TMP21 


L40397 


Below 


52 


1158_s_at 


calmodulin 3 phosphorylase kinase 


CALM3 


J04046 


Above 






delta 








53 


34782_at 


jumonji mouse homolog 


JMJ 


AL021938 


Below 


54 


37S93_at 


protein tyrosine phosphatase non- 


PTPN2 


AI82S8S0 


Below 






receptor type 2 








55 


39758 Jf_at 


Lysosomal-associated membrane 


LAMP1 


J04182 


Below 






protein 1 








56 


35151_at 


tumor suppressor deleted in oral 


DOC-1R 


AF0S9814 


Below 






panpfM*— Tf»lntp/"? 1 








57 


38096_f_at 


major histocompatibility complex 


HLA-DPB1 


M83664 


Above 






class II DP beta 1 










4U40 /_at 


succinate dehydrogenase complex 


SDHD 


AB006202 


Below 






subunit D integral membrane protein 








J7 


J7 / ai 


SI 00 calcium-binding protein A13 


S100A13 


AJ541308 


Below 


60 


41812_s_at 


KIAA0906 protem 


KIAA0906 


AB020713 


Below 


61 


34336_at 


lysyl-tRNA synthetase 


KARS 


D32053 


Below 


62 


38336_at 


KIAA1013 protein 


KIAA1013 


AB023230 


Below 


63 


32253_at 


arginine-glutamic acid dipeptide RE 


RERE 


AB007927 


Below 






repeats 








64 


3573 l_at 


integrin alpha 4 antigen CD49D alpha ITGA4 


XI 6983 


Below 






4 subunit of VLA-4 receptor 










^fuoyo at 


C-type calcium dependent 


CLECSF2 


X96719 


Below 






carbohydrate-recognition domain 












lectin superfamily member 2 












activation-induced 








66 


840_at 


zinc finger protein 220 


ZNF220 


U47742 


Above 


67 


41171_at 


proteasome prosome macropain 


PSME2 


D45248 


Above 






activator subunit 2 PA28 beta 








68 


34877_at 


Janus kinase 1 a protein tyrosine 


JAK1 


AL039831 


Above 






kinase 








69 


37190_at 


WAS protein family member 1 


WASF1 


D87459 


Below 


70 


31690_at 


Glutamate dehydrogenase-2 


GLUD2 


U08997' 


Below 
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71 


4096 l_at 


SWI/SNF related matrix associated 


SMARCA2 


X728S9 


Below 






actin dependent regulator of 












olifntnofin cuViftamilv a mp.rnher 2 
CJLliOl lie* till DWUlaiiiiiy «. lj.i^/iiiu'wi *~ 








72 


38149_at 


KIAA0053 gene product 


KIAA0053 


D29642 


Above 


73 


2061_at 


integrin alpha 4 antigen CD49D alpha 


ITGA4 


LI 2002 


Below 






4 subunit of VLA-4 receptor 








74 


2012_s_at 


protein kinase DNA-activated 


PRKDC 


U34994 


Below 






catalytic polypeptide 








75 


36878 f at 


major histocompatibility complex 


HLA-DQB1 


M60028 


Above 






class II DQ beta 1 








76 


3482 l_at 


DKFZP586D0623 protein 


DKFZP586D06 AL050197 


Below 


77 


36980 at 


proline-rich protein with nuclear 


B4-2 


U03105 


Below 






targeting signal 








1R 
/ o 


853 at 


nuclear factor erythroid-derived 2 like NFE2L2 


S74017 


Below 


19 


3y3/u_at 


z 


CASP1 


U13697 


Below 






protease lnicricmsJii 1 ucw wuuvwuaov 








80 


32572_at 


ubiquihn specific protease 9 X 


T TQPQY 


X98296 


Below 






chromosome Drosophila fat facets 












related 






Below 


81 


387_at 


cyclin-dependent kinase 9 CDC2- 


CDK9 


X80230 






related kinase 








82 


35300_at 


glutamyl-prolyl-tRNA syntiietase 


EPRS 


X54326 


Below 


83 


36155_at 


KIAA0275 gene product 


KIAA0275 


D87465 


Below 


84 


37625_at 


Interferon regulatory factor 4 


(IRF4 


U52682 


Below 


85 


35763_at 


KIAA0540 protein 


KIAA0540 


AB01U12 


Below 


86 


39077_at 


DR1 -associated protein 1 negative 


DRAP1 


U41843 


Below 






cofactor 2 alpha 








87 


40132_g_at 


Follistatin-like 1 


FSTL1 


D89937 


Below 


88 


32615_at 


aspartyl-tRNA synthetase 


DARS 


J05032 


Below 


89 


38357_at 


Homo sapiens mRNA cDNA 




AL049321 


Above 






DKFZp564D156 from clone 












DKFZp564D156 








90 


34817_s_at 


ataxin 2 related protein 


A2LP 


U70671 


Above 


91 


40856_at 


serine or cysteine proteinase inhibitor 


SERPINF1 


U29953 


Below 






clade F alpha-2 antiplasmin pigment 












epithelium derived factor member 1 








92 


39784_at 


eukaryotic translation initiation factoi 


• EIF2S1 


U26032 


Below 






2 subunit 1 alpha 35kD 









93 37600_at 

94 40839_at 

95 34832_s_at 

96 33244_at 

97 31516_f_at 

98 35266 at 



extracellular matrix protein 1 

ubiquitin-like 3 

KIAA0763 gene product 

chimerin chimaerin 2 

basic transcription factor 3 like 1 

bladder cancer associated protein 



ECM1 


U68186 


Below 


UBL3 


AL080177 


Below 


KIAA0763 


AB018306 


Below 


CHN2 


U07223 


Below 


BTF3L1 


M90354 


Below 


BLCAP 


AL049288 


Above 
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99 


253_g_ 


.at 


(clone GPCR W) G protein-linked 
receptor gene (GPCR) gene 




L42324 


Below 


100 


35227_ 


at 


retinoblastoma-binding protein 8 


RBBPS 


U72066 


Below 


101 


41073_ 


_ at 


G protein-coupled receptor 49 


GPR49 


AI743745 


Below 


102 


38084. 


J* 


chromobox homolog 3 Drosophila 
HP1 gamma 


CBX3 


AI797801 


Below 


103 


39025_ 


_ at 


6.2 kd protein 


LOC54543 


AI557912 


Below 


104 


32085_ 


at 


KIAA0981 protein 


KIAA0981 


AB023198 


Above 


105 


38902_ 


_r_at 


Activating transcription factor 2 


ATF2 


X15875 


Below 



3. T-statistics 

T-statistics is a classical feature selection approach. The t-statistics of a gene is 
5 defined as T = \\x x - ^ 2 |/sqrt(c>i 2 /ni + <r 2 2 /n 2 ), where |Lij is the mean expression of that 
gene in the i th class, a 2 is the variance of that gene in the i th class and n\ is the size of 
the i th class. This formula assigns higher value to a gene that has larger mean 
difference between two classes and has smaller variance within both classes. For 
BCR-ABL, hyperdiploid >50, MLL, Novel, and TEL-MiLl the top ranked 40 genes 

10 are listed in Tables 16, 1 8, 19, 20, and 22, whereas for E2A-PBX1 and T-ALL only 
the top 30 and 3 1 genes are shown. Additional genes that may be used in expression 
profiles to assign subjects to a leukemia risk group are shown in Tables 54-60. The 
genes in Tables 54-60 were selected on the basis of having a T-statistic value greater 
than the T-statistic vlaue for the gene when examined as a disciminator in 999 of 1000 

15 permutations of the data set (pO.OOl; this statistical test is described elsewhere 
herein). Of these genes, only those having a T-statistic absolute values equal to or 
greater than 8 (representing a nominal p value of ~<X0001) are shown in Tables 54- 
50. 

Generally, using the top 20-40 genes did not result in significant changes to 
20 subtype prediction accuracy. Accordingly, the top 20 genes were used for subtype 
prediction, unless noted otherwise. 
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Table 16. Genes Selected by T statistics for BCR-ABL 



7 
8 



Affymetrix 
number 



Gene Name 



Gene 
Symbol 



Reference 
number 



32319 at 



36194 at 



1211 s at 



37397 at 



330_s_at 

33774_at 

202_at 
1558_g_at 



tumor necrosis factor ligand TNFSF4 
superfamily member 4 tax- 
transcriptionally activated 
glycoprotein 1 34kD 

low density lipoprotein-related LRP AP 1 
protein-associated protein 1 alpha- 
2-macroglobulin receptor- 
associated protein 1 

CASP2 and RIPK1 domain CRADD 
containing adaptor with death 
domain 
Homo sapiens 

platelet/endothelial PECAM 
cell adhesion molecule- 1 
(PECAM- 1) gene, exon 16 and 
complete cds. 

tubulin, alpha 1, isoform44 TUBA1 



9 3969 l_at 

10 2045_s_at 

11 36591_at 

12 1386 at 



caspase 8 apoptosis-related 
cysteine protease 
heat shock transcription factor 2 

p21/Cdc42/Racl -activated kinase 
1 yeast Ste20-related 

SH3-containing protein SH3GLB1 
hemopoietic cell kinase 
tubulin alpha 1 testis specific 
protein tyrosine phosphatase non- 
receptor type 9 

13 35991_at Sm protein F 

14 41 273_at FK506 binding protein 1 2- 

rapamycin associated protein 1 

15 35970_g_at M-phase phosphoprotein 9 

16 3 863 6_at immunoglobuUn superfamily 

containing leucine-rich repeat 

17 36683_at matrix Gla protein 

1 8 39070_at singed Drosophila like sea urchin 

fascin homolog like 

1 9 40798__s_at a disintegrin and metalloproteinase 

domain 10 

20 4 1 649_at FOXJ2 forkhead factor 

21 3 8966_at glycoprotein synaptic 2 

22 34759_at Human hbc647 rnRNA sequence 

23 1434_at phosphatase and tensin homolog 

mutated in multiple advanced 
cancers 1 



CASP8 

HSF2 
PAK1 

SH3GLB1 
HCK 
TUBA1 

PTFN9 

LSM6 
FRAP1 

MPHOSPH9 
ISLR 

MGP 
SNL 

ADAM 10 

LOC55810 
GPSN2 

PTEN 



AL022310 

M63959 

U84388 
L34657 



HG2259- 
HT2348 

X98172 

M65217 
U24152 

AB007960 

M16592 

X06956 

iVIOJ) / JO 

AA9 17945 
AL046940 

N23137 
AB003184 



T-stat 
value 



9.9147 

-9.7639 
9.6562 

9.5307 
-9.3898 
9.3382 



9.0298 
8.9732 

8.6474 
8.4291 



Above/ 
Below 
Mean 



12.0346 Above 



-11.3077 Below 



10.6627 Above 



10.2460 Above 



10.0540 Above 



Above 

Below 
Above 

Above 
Below 
Above 
JO glow 

Above 
Above 

Above 
Above 



AI953789 


-8.3872 


Below 


U03057 


8.2583 


Above 


Z48579 


8.2283 


Above 


AF038177 


8.2275 


Above 


AF03895S 


8.2080 


Above 


U68494 


8.1863 


Above 


U92436 


8.1671 


Above 
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24 


40167_s_at 


CS box-containing WD protein 


25 


40264_g_at 


zinc finger protein-like 1 


26 


36129_at 


KIAA0397 gene product 


27 


551_at 


El A bmding pro tern piUU 


28 


38345_at 


centrosomal protein 1 




41 1 ^7 at 


mvrmiTi T^liosiVhatase tarfiet subumt 

2 


30 


3906S_at 


protein phosphatase 2 regulatory 






subunit B B56 delta isoform 


1 1 


JO 1 UU d I 


lirrrmVmrv+f* antifpn 7S 
iy y tt/ "'i 1 *^ 


99 




ribonucleotide reductase Ml 






polypeptide 


33 


39519 at 


KIAA0692 protein 


34 


327S8_at 


RAN binding protein 2 


35 


34882_at 


nucleolar protein KKE/D repeat 


36 


2064 g at 


excision repair cross- 






complementing rodent repair 






deficiency complementation group 
5 


37 


41S36_at 


protein with polyglutamine repeat 






polfium r» o 9 lininpn^itaQi Q 
t/dlUlLUll l^a/l I1AJ 11 AO VI o loolo 






endoplasmic reticulum protein 


9R 


1 Sfi^ <: at 

IJUJ a ill 


tumor necrosis factor receptor 






superfarnily member 1A 


39 


37047_at 


Niemann-Pick disease type CI 


40 


32724_at 


phytanoyl-CoA hydroxylase 






Refsum disease 



LOC55884 


AF038187 


8.1655 


Above 


ZFPL1 


AF001891 


8.1384 


Above 




/YDUU / OJ / 


8.0041 


Above 


EP300 


U01877 


-7.7578 


Below 


CEP1 


AF083322 


-7.7431 


Below 


MYPT2 


AB007972 


-7.7301 


Below 


PPP9P <n 




-7.6161 


Below 


LY75 


AF011333 


7.5830 


Above 


RRM1 


X59543 


7.5778 


Above. 


KIAA0692 


AB014592 


7.4662 


Above 


RANBP2 


D42063 


7.4114 


Above 


NOP56 


Y12065 


7.3622 


Above 


ERCC5 


L20046 


7.3597 


Above 


ERPROT213 U94836 


7.3350 


Above 


-21 








TNFRSF1A 


M58286 


7.3039 


Above 


NPC1 


AF002020 


7.2357 


Above 


PHYH 


AF023462 


-7.2252 


Below 



Table 17. Genes Selected by T statistics for E2A-PBX1 



Affymetrix 
number 



Gene Name 



Gene 
Symbol 



Reference 
number 



T-stat Above/ 
value Below 
Mean 



4 

5 



7 
8 
9 



32063 at 



33355 at 



40454_at 

717_at 
39070 at 



PBX1 



PBX1 



pre-B-cell leukemia transcription 
factor 1 

Homo sapiens cDNA FLJ12900 
fis clone NT2RP2004321 (by 
CELERA search of target 
sequence = PBX1) 

FAT tumor suppressor Drosophila FAT 
homolog 

GS3955 protein GS3955 

singed Drosophila like sea urchin SNL 
fascin homolog like 



M86546 
AL049381 

X87241 

D87119 
U03057 



NFKBIL1 Y14768 



33641_g_at nuclear factor of kappa light 

polypeptide gene enhancer in B- 

cells inhibitor-like 1 
36536_at schwannomin interacting protein 1 SCHIP-1 AF070614 

854_at B lymphoid tyrosine kinase BLK S76617 

37625_at interferon regulatory factor 4 IRF4 U52682 



126.7442 Above 
36.6116 Above 



30.7577 Above 

23.7813 Above 

-22.8956 Below 

-20.4637 Below 

-20.1554 Below 

19.6467 Above 

18.8419 Above 
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10 

11 

12 
13 



14 
15 



39614_at 
37099_at 

38994_at 
37641 at 



40113_at 
2031 s at 



16 330_s_at 

17 38340_at 

18 3S510_at 

19 268_at 

20 2062__at 

21 37893_at 

22 3S580_at 

23 40049_at 

24 38393_at 

25 39379_at 

26 430_at 

27 37975_at 

28 34862_at 

29 39756_g_at 

30 307__at 

31 37304_at 

32 1287_at 

33 1520_s_at 

34 596_s_at 

35 37493_at 

36 36452_at 

37 1081 at 



KIAA0802 protein KIAA0802 

arachidonate 5 -lipoxygenase- ALOX5AP 
activating protein 

STAT induced STAT inliibitor-2 STATI2 

Human gene for hepatitis C- 
associated microtubular aggregate 
protein p44, exon 9 and complete 
cds. 

GS3955 protein GS3955 
cyclin-dependent kinase inhibitor CDKN 1 A 
lAp21 Cipl 

tubulin, alpha 1 , isoform 44 TUBA1 



huntingtin interacting protein- 1 - KIA A065 5 
related 

Homo sapiens mRNA cDNA 
DKFZp586B0220 

Homo sapiens platelet/endothelial 
cell adhesion molecule- 1 
(PECAM-1) gene, exon 16 and 
complete cds. 

insulin-like growth factor binding IGFBP7 
protein 7 

protein tyrosine phosphatase non- PTPN2 
receptor type 2 

guanine nucleotide binding protein GNAQ 
G protein q polypeptide 



AB01S345 
AI806222 

AF037989 
D28915 



D87119 
U03106 

HG2259- 
HT2348 

AB014555 
AL049435 



death-associated protein kinase 1 

KIAA0247 gene product 

Homo sapiens mRNA cDNA 
DICFZp586Cl0l9 

nucleoside phosphorylase 

cytochrome b-245 beta 
polypeptide chronic 
granulomatous disease 
CGI-49 protein 

X-box binding protein 1 
arachidonate 5 -lipoxygenase 
chromobox homolog 1 Drosophila 
HP1 beta 

ADP-ribosyltransferase NAD poly 
ADP-ribose polymerase 

interleukin 1 beta 

colony stimulating factor 3 
receptor granulocyte 

colony stimulating factor 2 
receptor beta low-affinity 
granulocyte-macrophage 
synaptopodin 

ornithine decarboxylase 1 



PECAM L34657 



L19182 
AI828880 

U43083 

X76104 
D87434 
AL049397 

X00737 
X04011 

AA005018 
Z93930 
J03600 
U35451 

J03473 

X04500 
M59820 

H04668 

AB028952 
M33764 



DAPK1 
KIAA0247 



NP 

CYBB 

LOC51097 
XBP1 
ALOX5 
CBX1 

ADPRT 

IL1B 
CSF3R 

CSF2RB 

KIAA1029 
ODC1 



17.8214 Above 

-17.7944 Below 

-17.6553 Below 

-17.3074 Below 



16.7288 Above 

-14.9826 Below 

-14.8016 Below 

14.7180 Above 

-14.4522 Below 

-13.7540 Below 

13.6403 Above 

13.5099 Above 

-12.8525 Below 

-12.3837 Below 

12.3436 Above 

12.2102 Above 

12.1307 Above 

-12.0743 Below 

12.0264 Above 

-11.9796 Below 

-11.9492 Below 

11.9422 Above 

11.9051 Above 

11.7327 Above 

-11.6814 Below 

11.6620 Above 

11.4021 Above 

11.2865 Above 
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38 


1563_s_at 


tunior necrosis factor receptor 


TNFRSF1A 


M58286 


-11.1361 


Below 






superfaniily member 1 A 










39 


39069_at 


AE-binding protein 1 


AEBP1 


AF053944 


11.0984 


Above 






ornithine decarboxylase 1 


ODC1 


XI 6277 


10.9475 


Above 






Table 18. Genes Selected by T statistics for Hyperdiploid > 50 








Affymetrix 


Gene Name 


Gene 


Reference 


T-stat 


Above/ 




number 




Symbol 


number 


value 


Below 














Mean 


1 


36620_at 


superoxide dismutase 1 soluble 


O/^VPv 1 

oUJJl 




9.1574 


Above 






amyotrophic lateral sclerosis 1 














adult 










2 


39878_at 


protocadherin 9 


PCDH9 


AI524125 


-6.9008 


Below 


3 


37543_at 


Rac/Cdc42 guanine exchange 


ARHGEF6 


D25304 


6.8366 


Above 






factor GEF 6 










4 


41470 at 


prominin mouse like 1 






6.7290 


Above 


5 


31492_at 


muscle specific gene 


M9 


AB019392 


-6.6885 


Below 


6 


38968_at 


SH3 -domain binding protein 5 


SH3BP5 


AB005047 


6.4051 


Above 






BTK-associated 










7 


1915 s at 


v-fos FBJ murine osteosarcoma 


FOS 


V01512 


6.4008 


Above 






viral oncogene homolog 






6.2865 


Above 


8 


37677_at 


phosphoglycerate kinase 1 


PGK1 


V00572 


9 


39867 at 


Tu translation elongation factor 


TUFM 


S75463 


-6.2299 


Below 






mitochondrial 










10 


36795_at 


prosaposin variant Gaucher 


PSAP 


J03077 


6.1812 


Above 






disease and variant metachromatic 














leukodystrophy 






-6.0877 


Below 


11 


40875_s_at 


small nuclear ribonucleoprotein 


SKRP70 


X06815 






70kD polypeptide RNP antigen 








Above 


12 


306_s_at 


high-mobility group nonhistone 


HMG14 


J02621 


6.0804 






chromosomal protein 14 










13 


41724_at 


accessory proteins BAP3 1/BAP29 DXS1357E 


vo 1 1 no 

Xoi iuy 


6.0244 


Above 


14 


39168_at 


Ac-like transposable element 


ALTE 


AB018328 


5.9336 


Above 


15 


955_at 


calmodulin type I 


CALM1 


HG1862- 


5.8650 


Above 








HT1897 






JLO 


joouh at 


neuropeptide Y 


NPY 


AI198311 


5.8313 


Above 


17 


39147_g_at 


alpha thalassemia/mental 


ATRX 


U72936 


w> .O iOi 


rVUU V C 






retardation syndrome X-linked 














RAD54 S. cerevisiae homolog 










1 c 


jyuoy_at 


AE-binding protein 1 


AEBP1 


AF053944 


-5.6901 


Below 


19 


37014_at 


myxovirus influenza resistance 1 


MX1 


M33882 


5.66S8 


Above 






homolog of murine interferon- 














inducible protein p78 










20 


1520_s_at 


interleukin 1 beta 


IL1B 


X04500 


5.6605 


Above 
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21 1488 at 



22 32553 at 



23 36169 at 



24 1817 at 



25 578 at 



protein tyrosine phosphatase 
receptor type K 



PTPRK 



MAZ 



26 
27 

28 
29 

30 
31 



35 
36 

37 
38 

39 
40 



1556_at 
40998_at 

37294_at 
1447_at 

35940_at 
33307 at 



32 1081_at 

33 34336_at 

34 41143 at 



3225 l_at 
35298_at 

38649_at 
36629_at 

3972 l_at 
2094 s at 



MYC-associated zinc finger 
protein purine-binding 
transcription factor 

NADH dehydrogenase ubiquinone NDUFA1 
1 alpha subcomplex 1 7.5kD 
MWFE 

prefoldin5 PFDN5 

Human recombination activating RAG2 
protein (RAG2) gene, last exon 

RNA binding motif protein 5 RBM5 
trinucleotide repeat containing 1 1 TNRC1 1 
THR-associated protein 230 kDa 
subunit 

B-cell translocation gene 1 anti- BTG1 
proliferative 

proteasome prosome macropain 
subunit beta type 1 

POU domain class 4 transcription POU4F1 
factor 1 

kraken-like BK126B4.1 



ornithine decarboxylase 1 ODC1 
lysyl-tRNA synthetase KARS 
Human calmodulin (CALM1) CALM1 
gene, exons 2,3,4,5 and 6, and 
complete cds 

hypothetical protein FLJ21 174 FLJ21 174 
eukaryotic translation initiation EIF3S7 
factor 3 subunit 7 zeta 66/67RD 
KIAA0970 protein 
glucocorticoid-induced leucine 
zipper 

ephrin-Bl EFNB1 

v-fos FBJ murine osteosarcoma FOS 
viral oncogene homolog 



L778S6 
M94046 

N47307 

D89667 

M94633 

U23946 
AF071309 

X61123 
D00761 

X64624 
AL022316 

M33764 
D32053 
U12022 

AA149307 
U54558 

KIAA0970 AB023187 
GILZ AI635895 



-5.5877 
-5.5000 



Below 
Below 



PSMB1 



5.4376 Above 



U09303 
K00650 



-5.4110 

-5.4026 

-5.3032 
5.2349 

-5.1877 
5.1699 

5.1200 
-5.0984 

-5.0822 
-5.0692 
5.0543 

5.0373 
-4.9499 

-4.9228 
4.8061 

4.7968 
4.7446 



Below 

Below 

Below 
Above 

Below 
Above 

Above 
Below 

Below 
Below 
Above 

Above 
Below 

Below 
Above 

Above 
Above 



Affymetrix 
number 



Table 19. Genes Selected by T statistics for MIX, 

Gene Reference 
Symbol number 



Gene Name 



T-stat 
value 



Above/ 
Below 
Mean 



1 307_at arachidonate 5-lipoxygenase ALOX5 J03600 

2 37280_at MAD mothers against MADH1 U59912 

decapentaplegic Drosophila 
homolog 1 

3 1520__s_at mterleukin 1 beta IL1B X04500 

4 36908_at Human macrophage mannose MRC1 M93221 

receptor (MRC1) gene, exon 30. 



-16.8244 
-15.4460 

-13.6764 
-11.8629 



Below 
Below 

Below 
Below 
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5 33412_at LGALS1 Lectin, galactoside- 

binding, soluble, 1 (galectin 1) 

6 2062_at insulin-like growth factor binding 

protein 7 

7 35940_at POU domain class 4 transcription 

factor 1 

8 3972 l_at ephrin-Bl 

9 39402_at interleukin 1 beta 

10 1737_s_at insulin-like growtli factor-binding 

protein 4 

11 37413_at dipeptidase 1 renal 

1 2 405 1 9_at protein tyrosine phosphatase 

receptor type C 

13 1971 _g_at fragile histidine triad gene 

14 1983_at cyclinD2 

15 38869_at KIAA 1069 protein 

16 40520 g_at protein tyrosine phosphatase . 

receptor type C 

17 171 8_at actin related protein 2/3 complex 

subunit 2 34 kD 

18 34237_at HBS1 S. cerevisiae like 

1 9 1 726_at DNA polymerase, epsilon, 

catalytic subunit 

20 36643_at discoidin domain receptor family 

member 1 

2 1 1 325_at MAD mothers against 

decapentaplegic Drosophila 
homolog 1 

22 39379_at Homo sapiens mRNA cDNA 

DKFZp586C1019 

23 36536_at schwaimomin interacting protein 1 

24 564_at guanine nucleotide binding protein 

G protein alpha 1 1 Gq class 

25 39705 jit KIAA0700 protein 

26 36105 at Human nonspecific crossreacting 

antigen mRNA, complete cds. 

27 174_s_at intersectin2 

28 39 1 14_at decidual protein induced by 

progesterone 

29 40436_g_at solute carrier family 25 

mitochondrial carrier adenine 
nucleotide translocator member 6 

30 794_at protein tyrosine phosphatase non- 

receptor type 6 

3 1 38032_at KIAA0736 gene product 

32 405 1 8_at protein tyrosine phosphatase 

receptor type C 

33 41762 at TIA1 cytotoxic granule-associated 

RNA-binding protein-like 1 

-70 



LGALS1 


AI535946 


11.0223 


Above 


IGFBP7 


L19182 


10.431S 


Above 


POU4F1 


X64624 


-10.1815 


Below 


EFNB1 


U09303 


-9.6158 


Below 


IL1B 


M15330 


-9.5998 


Below 


IGFBP4 


M62403 


-9.4119 


Below 


DPEP1 


J05257 


-9.4101 


Below 


PTPRC 


Y00638 


9.3163 


Above 


UTJTT 






XD C1U w 


CCND2 


X68452 


-9.2213 


Below 


KIAA1069 


AB028992 


-9.1951 


Below 


PTPRC 


Y00638 


9.1099 


Above 


ARPC2 


U50523 


9.0435 


Above 


TTT> OH 


A DAI 


-o.oZUo 


Below 




HG919- 


-8.4664 


Below 




HT919 






DDR1 


L20817 


-8.4627 


Below 


MADH1 


U59423 


-8.3762 


Below 




AL049397 


8.2974 


Above 


SCHIP-1 


AF070614 


-8.1177 


Below 


GNA11 


M69013 


-8.1107 


Below 


KIAA0700 


AB014600 


-7.9334 


Below 


"NT/"'* A 


JVJLlo iZo 


7 £Q1 1 


D CIO W 


ITSN2 


U61167 


7.5752 


Above 


DEPP 


AB022718 


-7.4767 


Below 


SLC25A6 


J03592 


7.3952 


Above 


PTPNo 




7 71 Q7 


Above 


KIAA0736 


ABO 18279 


-7.0718 


Below 


PTPRC 


Y00062 


6.9829 


Above 


TIAL1 


D64015 


-6.9118 


Below 



BNSDOCI D: < WO 03083 1 40A2 J_> 



WO 03/083140 



PCT/US03/08486 



34 1389__at membrane metallo-endopeptidase MME 

neutral endopeptidase 
enkephalinase CALLA CD10 

35 39967_at leucine zipper down-regulated in 

cancer 1 

36 188_at ephrin-Bl 

37 160033_s_at X-ray repair complementing 

defective repair in Chinese 
hamster cells 1 

38 409 1 3_at ATPase Ca transporting plasma 

membrane 4 

39 37398_at platelet/endothelial cell adhesion 

molecule CD31 antigen 



LDOC1 

EFNB1 
XRCC1 

ATP2B4 
PECAM1 



40 1488 at 



protein tyrosine phosphatase 
receptor type K 



PTPRK 



J03779 

AB019527 

U09303 
NM_006297 

W28589 
AA100961 

L77886 



-6.7734 Below 

-6.7415 Below 

-6.5964 Below 

-6.5936 Below 

-6.5774 Below 

-6.5675 Below 

-6.5584 Below 



1 

2 



4 
5 



Table 20. Genes Selected by T statistics for Novel Risk Group 



Affymetrix 
number 



Gene Name 



Gene 
Symbol 



Reference 
number 



T-stat 
value 



Above/ 
Below 
Mean 



41734_at KIAA0870 protein 

318 92_at protein tyrosine phosphatase 

receptor type M 
995_g_at protein tyrosine phosphatase 

receptor type M 
34676_at KIAA1 099 protein 
37908_at guanine nucleotide binding protein 

11 

37960_at carbohydrate chondroitin 6/keratan 
sulfotransferase 2 



KIAA0870 
PTPRM 

PTPRM 

KIAA1099 
GNG11 

CHST2 



AB020677 
X58288 

X58288 

AB029022 
U31384 

AB014679 



-40.5168 
33.4654 



Below 
Above 



24.7557 Above 



14.0491 
11.4548 



Above 
Above 



10.9971 Above 



7 


33410__at 


integrin alpha 6 


ITGA6 


S66213 


10.0370 


Above 


8 


40585_at 


adenylate cyclase 7 


ADCY7 


-r>'-» f TOO 


-9:5897 


Below 


9 


33284_at 


myeloperoxidase 


MPO 


M19507 


-9.4724 


Below 


10 


41159_at 


clatruin heavy polypeptide He 


CLTC 


D21260 


9.4489 


Above 


11 


36591_at 


tubulin alpha 1 testis specific 


TUBA1 


X06956 


-9.1387 


Below 


12 


37712_g_at 


MADS box transcription enhancer 


MEF2C 


S57212 


-9.1225 


Below 




factor 2 polypeptide C myocyte 














enhancer factor 2C 






-9.0869 


Below 


13 


3S576_at 


H2B histone family member B 


H2BFB 


AJ223353 


14 


38408_at 


transmembrane 4 superfamily 


TM4SF2 


L10373 


-8.7026 


Below 






member 2 






-8.3540 


Below 


15 


33907_at 


eukaryotic translation initiation 


EIF4G3 


AF012072 






factor 4 gamma 3 










16 


41273_at 


FK506 binding protein 12- 


FRAP1 


AL046940 


-8.3212 


Below 






rapamycin associated protein 1 










17 


402_s_at 


intercellular adhesion molecule 3 


ICAM3 


X69819 


-7.9741 


Below 


18 


35112_at 


regulator of G-protein signalling 9 


RGS9 


AF071476 


7.8348 


Above 


19 


34850_at 


ubiquitin-conjugating enzyme E2E UBE2E3 


AB017644 


7.8197 


Above 






3 homologous to yeast UBC4/5 










20 


37030_at 


KIAA0887 protein 


KIAA0887 


AB020694 


-7.6343 


Below 



-71- 



BNSDOCID. *WO 0J063I40A2_I_J 



WO 03/083140 

21 36322_at 

22 39509_at 

23 4009 l_at 

24 37280_at 

25 1325_at 

26 831_at 

27 37600_at 

28 41266_at 

29 3695 8_at 

30 36564_at 

31 32174_at 

32 619_s_at 

33 40749 at 



34 31894_at 

35 32319 at 



36 38259 at 

37 35629_at 

38 38700_at 

39 37397 at 



40 41127 at 



fiicosyltransferase 7 alpha 1 3 FUT7 
fucosyltransferase 

Homo sapiens cDNA FLJ22071 

B-cell CLL/lymphonia 6 zinc BCL6 

finger protein 5 1 

MAD mothers against MADH1 
decapentaplegic Drosophila 
homolog 1 

MAD mothers against MADH1 
decapentaplegic Drosophila 
homolog 1 

DEAD/H Asp-Glu-Ala- Asp/His DDX1 0 
box polypeptide 10 RNA helicase 



extracellular matrix protein 1 

integrin alpha 6 

zyxin 

Human DNA sequence from clone 
RP5-1 174N9 on chromosome 
lp34.1-35.3 
solute carrier family 9 
sodium/hydrogen exchanger 
isoform 3 regulatory factor 1 
membrane-spanning 4-domains 
subfamily A member 2 Fc 
fragment of IgE high affinity I 
receptor for beta polypeptide 
membrane-spanning 4-domains 
subfamily A member 2 Fc 
fragment of IgE high affinity I 
receptor for beta polypeptide 
centromere protein C 1 

tumor necrosis factor ligand 
superfamily member 4 tax- 
transcriptionally activated 
glycoprotein 1 34kD 
syntaxin binding protein 2 

hypothetical protein 



ECM1 

ITGA6 

ZYX 



AB012668 

AI69234S 
U00115 

U59912 
U59423 

U28042 

U68186 
X53586 
X95735 
W27419 



SLC9A3R1 AF0 15926 



MS4A2 



MS4A2 



CENPC1 
TNFSF4 



STXBP2 

DJ1042K10. 
2 



cysteine and glycine-rich protein 1 CSRP1 

Homo sapiens platelet/endothelial PECAM 
cell adhesion molecule- 1 
(PECAM- 1) gene, exon 16 and 
complete cds. 

solute carrier family 1 SLC1 A4 

glutamate/neutral amino acid 
transporter member 4 



M27394 



X07203 



M95724 
AL022310 



AB002559 
AL022238 

M33146 
L34657 



L14595 



PCT/US03/08486 

-7.6240 Below 

-7.6232 Below 

-7.6171 Below 

7.5991 Above 

7.5824 Above 

7.4276 Above 

-7.2991 Below 

7.2985 Above 

-7.2889 Below 

-7.2848 Below 

-7.2749 Below 

-7.2325 Below 

-7.2063 Below 



6.9679 Above 
6.8225 Above 



-6.6992 Below 

-6.6968 Below 

-6.6962 Below 

-6.6934 Below 



-6.6892 Below 



-72- 



BNSDOCID: <WO 03083 140A2_L> 



WO 03/083140 



PCT/US03/08486 



Table 21. Genes Selected by T statistics for T-ALL 



AfTymetrix 
number 



Gene Name 



Gene 
Symbol 



Reference 
number 



T-stat Above/ 
value Below 

Mean 

Below 



1 

2 

3 
4 

5 
6 
7 

8 
9 



38242_at 
38319_at 

37988_at 

38147_at 

3S522_s_at 

35350_at 

36277_at 

3S604_at 
33705 at 



10 36S78_f_at 

11 36638_at 

12 32794_g_at 

13 32174_at 

14 160041 at 



B cell linker protein 
CD3D antigen delta polypeptide 
TiT3 complex 

CD79B antigen immunoglobulin- 
associated beta 

SH2 domain protein 1 A Duncan s 
disease lymphoproliferative 
syndrome 
CD22 antigen 

B cell RAG associated protein 
Human membran protein (CD3- 
epsilon) gene, exon 9. 

neuropeptide Y 

phosphodiesterase 4B cAMP- 
specific dunce Drosophila 
homolog phosphodiesterase E4 
major histocompatibility complex 
class II DQ beta 1 

connective tissue growth factor 

T cell receptor beta locus 

solute carrier family 9 
sodium/hydrogen exchanger 
isoform 3 regulatory factor 1 
protein tyrosine phosphatase non- 
receptor type 18 brain-derived 



SLP65 
CD3D 

CD79B 

SH2D1A 

CD22 

BRAG 

CD3E 

NPY 
PDE4B 



AF068180 
AA919102 

M89957 

AL023657 

X52785 
AB011170 

M23323 

AI198311 
L20971 



-115.8362 
27.6995 



Above 



HLA-DQB1 M60028 



CTGF 
TRB 

SLC9A3R1 
PTPN18 



X78947 
X00437 
AF015926 

X79568 



-23.7294 Below 

22.4501 Above 

-21.2795 Below 

-19.1460 Below 

19.0859 Above 



-18.8194 
-18.6383 



Below 
Below 



-18.5620 Below 

-18.2772 Below 

17.9081 Above 

17.4427 Above 

-17.3412 Below 



Keiow 
Below 

Below 
Below 
Below 
Below 
Below 
Below 



15 


38521_at 


CD22 antigen 


CD22 


X59350 


-17.0388 


16 


38018_g_at 


CD79A antigen immunoglobulin- 


CD79A 


U05259 


-16,7948 




associated alpha 








17 


36571_at 


topoisomerase DNA II beta 180kD TOP2B 


X68060 


-16.7508 


18 


1096_g_at 


CD 19 antigen 


CD19 


M28170 


-16.4583 


19 


39318_at 


T-cell leukenna/lymphoma 1 A 


TCL1A 


X82240 


-16.2017 


20 


41710_at 


hypothetical protein 


LOC54103 


AL079277 


-15.9099 


21 


599_at 


H2.0 Drosophila like homeo box 1 


HLX1 


M60721 


-15.5425 


22 


266_s_at 


CD24 antigen small cell lung 
carcinoma cluster 4 antigen 


CD24 


L33930 


-15.0123 


23 


36502_at 


PFTAIRE protein kinase 1 


PFTK1 


AB020641 


-14.9972 


24 


39114_at 


decidual protein induced by 


DEPP 


AB022718 


-14.9886 


25 


37539_at 


progesterone 

RalGDS-like gene KIAA0959 


KIAA0959 


AB023176 


-14.6872 


26 


40775_at 


protein 

integral membrane protein 2A 


ITM2A 


AL021786 


14.5666 


27 


34033_s_at 


leukocyte immunoglobulin-like 
receptor subfamily A with TM 
domain member 2 


LILRA2 


AF025531 


-14.3809 



-73- 



BNSUUUD! «!VVO OJOBJ 1 40A2J_i> 



WO 03/083140 



PCT7US03/08486 



28 2031_s_at 

29 38051_at 

30 35794_at 

31 41156_g_at 

32 32979_at 

33 32562_at 

34 36536_at 

35 36108_at 

36 41734_at 

37 41153_fjit 

38 37710_at 

39 39893_at 

40 37908 at 



cyclin-dependent kinase inhibitor 
lAp21 Cipl 

mal T-cell differentiation protein 

KIAA0942 protein 

catenin cadherin-associated 
protein alpha 1 102kD 

GRB2-associated binding protein 
1 

endoglin Osler-Rendu-Weber 
syndrome 1 

schwainioniin interacting protein 1 

major histocompatibility complex 
class II DQ beta 1 

KIAA0870 protein 

Homo sapiens alphaE-catenin 
(CTNNA1) gene, exon 18 and 
complete cds. 

MADS box transcription enhancer 
factor 2 polypeptide C myocyte 
enhancer factor 2C 
guanine nucleotide binding protein 
G protein gamma 7 

guanine nucleotide binding protein 
11 



CDKN1A U03106 



MAL 

KIAA0942 

CTNNA1 

GAB1 

ENG 

SCHIP-1 
HLA-DQB1 

KIAA0870 
CTNNA1 

MEF2C 

GNG7 
GNG11 



X76220 

AB023159 

U03100 

U43885 

X72012 

AF070614 
M16276 

AB020677 
AF102803 

L0S895 

AB010414 
U31384 



-14.1071 Below 

14.0743 Above 

-13.9659 Below 

-13.8135 Below 

-13.5842 Below 

-13.4209 Below 

-13.4172 Below 

-13.3518 Below 

-13.2672 Below 

-12.7927 Below 

-12.7716 Below 

-12.7696 Below 

-12.7353 Below 



Affymetrix 
number 



Table 22. Genes Selected by T statistics for TEL-AML1 
Gene Name Gene Reference 



Symbol 



number 



T-stat Above/ 
value Below 
Mean 



ARHGEF4 AB029035 



1 38578_at tumor necrosis factor receptor TNFRSF7 M63928 

superfamily member 7 

2 38203_at potassium intermediate/small KCNN1 U69883 

conductance calcium-activated 
channel subfamily N member 1 

3 36524_at Rho guanine nucleotide exchange 

factor GEF 4 

4 37780__at piccolo presynaptic cytomatrix PCLO ABO 11131 

protein 

5 35614__at transcription factor-like 5 basic TCFL5 AB012124 

helix-loop-helix 

6 160029_at protein kinase C beta 1 PRKCB1 X07109 

7 1980_s_at non-metastatic cells 2 protein NME2 X58965 

NM23B expressed in 

8 1488_at protein tyrosine phosphatase PTPRK L77886 

receptor type K 

9 34 1 94_at Homo sapiens cDNA FLJ2 1 697 AL0493 1 3 

10 37908_at guanine nucleotide binding protein GNG1 1 U3 1384 

11 

11 40272_at collapsin response mediator CRMP1 D78012 

-74- 



15.2209 Above 



15.0804 Above 



14.9774 Above 

14.1405 Above 

12.9369 Above 

12.5429 Above 

-12.5035 Below 

12.3871 Above 

12.1089 Above 

11.4322 Above 

11.0625 Above 



WO 03/083140 



PCT/US03/084S6 



protein 1 



12 


41097_at 


telomeric repeat binding factor 2 


TERF2 


AF002999 


11.0133 


Above 




O JUi/U al 


T-Tr\mr\ c^nipnc "mT?7sJ A pTYMA 

X lUAJLUJ C>apiV-/UC> 11JULVLNXT. V/J_/A^«T\. 




AL080190 


10.S763 


Above 






U JsJr Zvp4 3 4 AZUZ 










14 


32730_at 


Homo sapiens mRNA for 




AL080059 


10.7439 


Above 






KIAA1750 










15 


1325__at 


MAD mothers against 


MADH1 


U59423 


10.5332 


Above 






decapentaplegic Drosophila 














nomoiog i 








Above 


16 


41819_at 


FYN-binding protein FYB- 


FYB 


U93049 


10.3692 






120/130 










17 


1299_at 


telomeric repeat binding factor 2 


TERF2 


X93512 


10.2921 


Above 


IS 


35665__at 


phosphoinositide-3 -kinase class 3 


PIK3C3 


Z46973 


10.0568 


Above 




300 J /_ai 


jvQO-speciiic guanine nuuicvjiiuc 


P 1 1 4-RHO- 


AB011093 


9.8824 


Above 






exchange factor pi 14 


GEF 








zu 


3 /ZoU dl 


jxLt\±J liiULiicio dguixiai 


MADH1 

J. VJLn.J AAA 


U59912 


9.S662 


Above 






decapentaplegic Drosophila 














homolog 1 










21 


1936_s_at 


proto-oncogene c-myc, alt. 




HG3523- 


-9.6621 


Below 






transcript 3, ORE 1 14 




HT4899 






22 


1077_at 


recombination activating gene 1 


RAG1 


M29474 


y.45oi 


Above 


23 


38763_at 


Human (clone D21-1) L-iditol-2 




L29254 


-9.2719 


Below 






dehydrogenase gene, exon 9 and 














complete cds. 










24 


41295_at 


GTT1 protein 


GTT1 


AL041780 


-9.1813 


Below 


25 


36008_at 


protein tyrosine phosphatase type 


PTP4A3 


AF041434 


9.1682 


Above 






IV A member 3 










26 


3S570_at 


major histocompatibility complex 


HLA-DOB 


X03066 


9.0394 


Above 






class II DO beta 










27 


32163_f_at 


EST 




AA2 16639 


9.0392 


Above 


28 


40570_at 


forkhead box 0 1 A 


FOXOIA 


AF032885 


8.9931 


Above 






rhabdomyosarcoma 








Above 


29 


32724_at 


phytanoyl-CoA hydroxylase 


PHYH 


AF023462 


8.9571 






Refsum disease 










30 


932_i_at 


zinc finger protein 91 HPF7 


ZNF91 


LI 1672 


8.8075 


Above 






HTF10 










31 


37343_at 


inositol 1 4 5 -triphosphate receptor ITPR3 


U01062 


• 8.7321 


Above 






type 3 








Below 


32 


33447_at 


myosin light polypeptide 


MLCB 


X54304 


-8.6848 






regulatory non-sarcomeric 20kD 










33 


35362_at 


myosin X 


MYO10 


AB018342 


8.6700 


Above 


34 


38906_at 


spectrin alpha erythrocytic 1 


SPTA1 


M61877 


8.5010 


Above 






elliptocytosis 2 








Below 


35 


324Jf_at 


basic transcription factor 3 


BTF3 


HG1515- 


-8.4705 










HT1515 






36 


39329_at 


actinin alpha 1 


ACTN1 


X15804 


-8.3219 


Below 


37 


577_at 


midkine neurite growth-promoting MDK 


M94250 


8.2693 


Above 






factor 2 










38 


40729_s_at 


nuclear factor of kappa light 


NFKBIL1 


Y14768 


8.2000 


Above 



polypeptide gene enhancer in B- 
cells inhibitor-like 1 



-75- 



WO 03/083140 



PCT/US03/0S486 



39 41442_at 

40 36275_at 



4. Wilkins' 

This method of selecting genes uses the weighted sum of three components to 
5 estimate the discriminative value of each gene. The higher the score, the better the 
gene is at discriminating between the two classes. The input to the scoring method is 
preprocessed and normalized data. The idea of the metric is that a gene is a good 
discriminator if: (1) it is expressed in one class and not in the other, or if the gene is 
expressed in both classes, but significantly more so in one than the other, or (2) the 

10 gene is present in most samples, and the data are pure, in the sense that there is a 
threshold expression value for the gene where the gene generally has expression 
levels larger than the threshold in one class, and smaller than the threshold in the other 
class. The components of the metric were quantified as follows. For a gene, assume 
PRi is the ratio of "present" samples to all samples in class 1, where present means 

15 that the gene's expression value was not preprocessed to a constant (1). Assume PR 2 
is defined similarly for class 2. The first component of the metric, Mi, is estimated as 
the absolute difference between PRi and PR 2 . This value is between 0 (when the gene 
is equally present in both classes) and 1 (when the gene is expressed in one class and 
not in the other). The second component of the metric, M 2 , measures the extent to 

20 which the gene is present overall, and is defined as the average of PRi and PR 2 . The 
final component, M 3 , estimates the "purity", or existence of a threshold value. The 
gene expression values for the present samples are sorted into ascending order and a 
vector of their class labels is built, for example {+, +, +, -, +, +, -}- The next 
step is to find the best place to partition the samples so that the expression values for 

25 one class (maybe +) are less than the partition point, and the values from the other 

class are larger. Let L C i and Lc 2 be the number of class 1 and class 2 samples on the 
left side of the partition, respectively. Assume Rci and R C2 are defined similarly for 
the right side of the partition. Then the purity is estimated as: max {L C i - L C2 + Rc2 - 
Rcu Lc2 - Lei + Rci - Rc2> / number of total present samples. Each possible partition 

30 is checked. In the example above, the partition {+, +, +, || +, -, +, -} is the best 

-76- 



core-binding factor runt domain CBFA2T3 AB010419 8.0604 Above 

alpha subunit 2 translocated to 3 

Homo sapiens mRNA from AB002438 7.8550 Above 

chromosome 5q21-22 clone 

FBR89 



BNSDOCID: <WO 03083 140A2_I_> 



WO 03/083140 



PCT/US03/08486 



partition, with a purity value of M 3 = 7 / 1 1 = 0.64. The score for the gene is the 
weighted sum of 0.5*Mi + 0.25*M 2 + 0.25*M 3 . The top 50 genes for each subgroup 
selected by this metric are listed in Tables 23-29. For class prediction all 50 genes 
were used, unless otherwise stated. 



Table 23. Genes Selected by Wilkins' for BCR-ABL 













Above/ 


Affymetrix 

nnmhpr 
11U111UC1 




Gene 


Reference 


Train set 


Below 


Gene Name 


Symbol 


number 


score 


Mean 


1 32319_at 


tumor necrosis factor ligand 


TNFSF4 


AL022310 


0.6354 


Above 




superfamily member 4 tax- 
transcriptionally activated 












glycoprotein 1 34kD 






0.6352 


Below 


2 37479_at 


CD72 antigen 


CD72 


M54992 


3 1211_s_at 


CASP2 and RJPK1 domain 


CRADD 


U84388 


0.6265 


Above 




containing adaptor with death 












domain 






0.6161 


Above 


4 37397_at 


platelet/endothelial cell adhesion 


PECAM 


L34657 




molecule-1 (PECAM-1) gene 






0.6118 


Below 


5 33162_at 


insulin receptor 


INSR 




6 3969 l_at 


SH3-containing protein SH3GLB1 


SH3GLB1 


AB007960 


u.ouoy 


Above 


7 1558__g_at 


p21/Cdc42/Racl -activated kinase 1 


PAK1 


U24152 


U.OUo / 


Above 


yeast Ste20-related 








Above 


8 34759_at 


Human hbc647 mRNA sequence 




U68494 


O.oOol 


9 33774_at 


caspase 8 apoptosis-related cysteine CASP8 


X98172 


0.6040 


Above 


10 1326_at 


protease 

caspase 10 apoptosis-related 
cysteine protease 


CASP10 


U60519 


0.6021 


Above 


^^ 193.1.0 of. 


DKEZp5640222 from clone 




AL050002 


0.6010 


Above 




DKFZp5640222 






0.5989 


Above 


12 35970 _g_at 


M-phase phosphoprotein 9 


MPHOSPH9 


N23137 


13 41z/3_at 


FK506 binding protein 12- 


FRAP1 


AL046940 


0.5989 


Above 




rapamycin associated protein 1 










14 4079S_s_at 


a disintegrin and metalloproteinase 


ADAM 10 


Z48579 


0.5980 


Above 




domain 10 








Above 


15 40953_at 


calponin 3 acidic 


CNN3 


S80562 


0.5972 


16 1434_at 


phosphatase and tensin homolog 


PTEN 


U92436 


0.5963 


Below 




mutated in multiple advanced 












cancers 1 








Above 


17 38966_at 


glycoprotein synaptic 2 


GPSN2 


AF038958 


0.5953 


18 3599 l_at 


Sm protein F 


LSM6 


AA917945 


0.5938 


Above 


19 330_s__at 


tubulin, alpha 1, isoform44 


TUBA1 


HG2259- 


0.5938 


Above 






HT2348 






20 38032_at 


KIAA0736 gene product 


KIAA0736 


AB018279 


0.5934 


Above 


21 1983_at 


cyclin D2 


CCND2 


X68452 


0.5927 


Above 


22 36194_at 


low density lipoprotein-related 


LRPAP1 


M63959 


0.5914 


Below 




protein-associated protein 1 alpha- 
2-macroglobulin receptor- 











associated protein 1 
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23 


34460_at 


peripheral benzodiazepine receptor- 


PRAX-1 


AB014512 


0.59H 


Above 






associated protein 1 










24 


2001 g at 


ataxia telangiectasia mutated 


ATM 


U26455 


0.5910 


Above 






includes complementation groups A 














C and D 










25 


31443_at 


AML1 


A X vTT t 

AML1 


b/o34o 


A con/; 


Above 


26 


33410_at 


integrin alpha 6 


ITGA6 


S66213 


0.5896 


Above 


27 


37472 at 


mannosidase beta A lysosomal 


MANBA 


U60337 


0.5887 


Below 




iouyy at 


splicing factor arginine/serine-rich 




Lv±Oz7\J i +V 


U.JO / / 


■DC1U W 






1 cnltcinp factor 2 alternate snlicint? 














factor 










29 


38636_at 


immunoglobulin superfamily 


ISLR 


AB003184 


0.5858 


Above 






containing leucine-rich repeat 










30 


34314_at 


ribonucleotide reductase Ml 


RRM1 


X59543 


0.5858 


Below 






polypeptide 










31 


36129_at 


KIAA0397 gene product 


KIAA0397 


AB007857 


0.5858 


Above 


32 


40264_g_at 


zinc finger protein-like 1 


ZFPL1 


AF001S91 


0.5858 


Above 


33 


37399_at 


aldo-keto reductase family 1 


AJKE.1C3 


D 17793 


0.5852 


Above 






member C3 3-alpha hydroxysteroid 














dehydrogenase type II 










34 


381oU_at 


lymphocyte antigen 75 


JL I / J 






Above 


35 


4l649_at 


FOXJ2 forkhead factor 


LOC55810 


AF038177 


0.5832 


Above 


36 


36591_at 


tubulin alpha 1 testis specific 


TUBA1 


X06956 


0.5832 


Above 


37 


40167_s_at 


CS box-containing WD protein 


LOC55884 


AF03S187 


0.5832 


-Above 


38 


2064_g_at 


excision repair cross- 


ERCC5 


L20046 


0.5832 


Above 






complementing rodent repair 














deficiency complementation group 










39 


39729_at 


Human natural killer cell enhancing NKEFB 


L19185 


0.5829 


Below 






factor (NKEFB) rnRNA, complete 
cds. 










40 


38270 at 


poly ADP-ribose glycohydrolase 


PARG 


AF005043 


0.5828 


Below 


41 


40613_at 


uncharacterized hypothalamus 


HT012 


A T AT 1 T~l C 

AL031775 


a com 

0.5819 


Below 






protein HT012 










42 


39070_at 


singed Drosophila like sea urchin 


SNL 


T TAT AC7 

U03057 


a con 

0.5813 


Above 






fascin homolog like 










43 


40782_at 


short-chain 


SDR1 


A T7A/C 1 HA 1 

ArOol /41 


U.Dol3 


A 1_ 

Above 






dehydrogenase/reductase 1 










44 


34256_at 


sialyltransferase 9 CMP-NeuAc 


SIAT9 


AB018356 


0.5797 


Above 






lactosylceramide alpha-2 3- 














sialyltransferase GM3 synthase 










45 


41836_at 


protein with polyglutamine repeat 


ERPROT213 


U94836 


0.5777 


Above 






calcium ca2 homeostasis 


-21 












endoplasmic reticulum protein 










46 


35681_r_at 


zinc finger homeobox IB 


ZFHX1B 


AB01H41 


A C7CH 

0.5759 


Below 


47 


37190_at 


WAS protein family member 1 


WASF1 


D87459 


0.5759 


Below 


A O 

48 


3278S_at 


RAN binding protein 2 


RANBP2 




U.D /DO 


Above 


49 


828_at 


prostaglandin E receptor 2 subtype 


PTGER2 


U19487 


0.5740 


Above 






EP2 53kD 










50 


38220_at 


dihydropyrimidine dehydrogenase 


DPYD 


U20938 


0.5737 


Above 
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Table 24: Genes Selected by Wilkins* for E2A-PBX1 
Affymetrix Gene Name Gene Reference Train set Above/ 

number Symbol number score Below 

Mean 



1 3^063 at 


pit* -U w&xx xi-'uivtsxxxxa uaiis^iipuuii 


PBX1 


M86546 


0.8750 


Above 




factor 1 










2 38994_at 


STAT induced STAT inhibitor-2 


STATI2 


AF037989 


0.8252 


Below 


3 33355_at 


Homo sapiens cDNA FLJ 12900 fis 


PBX1 


AL049381 


0.8040 


Above 




clone NT2RP2004321 (by 












CELERA serach of target sequence 












= PBX1) 










4 40454 at 


PAT tiitnor <?nnnrp<i^or "Drnsfifiliila 


FAT 


X87241 


0.7899 


Above 




homolog 










5 753_at 


nidogen 2 


NED2 


D86425 


0.7368 


Above 


6 717_at 


GS3955 protein 


GS3955 


D87119 


0.7306 


Above 


/ 1 / oo dl 


c-mer proio-oncogene tyrosine 








Above 




kinase 










8 39070 at 


sinfyed T^vo^fYniiila lilcp "sf^a urchin 

OXXX^^VJ 1 .V 1 w D V,J J 1 1 1 cl XXXY.I/ OVfl UXV/AJUXI 


SNL 


U03057 


0.7271 


Below 




fascin homolog like 










9 1065_at 


fins-related tyrosine kinase 3 


FLT3 


U02687 


0.7160 


Below 


10 36650_at 


cyclin D2 


CCND2 


D13639 


0.7151 


Below 


11 33513_at 


signaling lymphocytic activation 


SLAM 


U33017 


0.7096 


Above 




molecule 










12 33748_at 


minor histocompatibility antigen 


KIAA0223 


D86976 


0.7084 


Below 




HA-1 










13 37225_at 


KIAA0172 protein 


K1AA0172 


D79994 


0.7033 


Above 


14 38717_at 


DKFZP586A0522 protein 


DKFZP586A AL050159 


0.7003 


Below 






0522 








15 854_at 


B lymphoid tyrosine kinase 


BLK 


S76617 


.0,6982 


Above 


ID JjOHl^g d.1 


nucicdr iacior 01 Ktippa iigiii 


NFKBEL1 


Y14768 


\j.oy jo 


-Below 




nnlvnP"ntiHp aenfi pnhanrpr in T-l- 
jj \Jiy uwuuv vxxxxcixx^wx xxx x-> 












cells inhibitor-like 1 










17 40468_at 


KIAA0554 protein 


KIAA0554 


AB011126 


0.6971 


Below 


18 41266_at 


integrin alpha 6 


ITGA6 


X535S6 


0.6965 


Below 


19 36536_at 


schwannomin interacting protein 1 


SCHIP-1 


AF070614 


0.6938 


Below 


20 362_at 


protein kinase C zeta 


PRKCZ 


Z15108 


0.6904 


Above 


21 755_at 


inositol 1 4 5 -triphosphate receptor 


ITPR1 


D26070 


0.6877 


Below 




type 1 










22 307_at 


arachidonate 5-lipoxygenase 


ALOX5 


J03600 


0.6875 


Below 


23 39614_at 


KIAA0802 protein 


KIAA0802 


AB018345 


0.6863 


Above 


24 1563_s_at 


tumor necrosis factor receptor 


TNFRSF1A 


M58286 


0.6837 


Below 




superfamily member 1 A 










25 38748_at 


adenosine deaminase RNA-specific AD ARB 1 


U76421 


0.6763 


Above 




Bl homolog of rat RED1 










26 41409_at 


basement membrane-induced gene 


ICB-1 


AF044896 


0.6757 


Below 


27 34892_at 


tumor necrosis factor receptor 


TNFRSF10B 


AF0 16266 


0.6726 


Below 




superfamily member 10b 










28 40648_at 


c-mer proto-oncogene tyrosine 


MERTK 


U08023 


0.6710 


Above 




kinase 










29 38408_at 


transmembrane 4 superfamily 


TM4SF2 


L10373 


0.6667 


Below 



member 2 
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30 34583_at 


fms-related tyrosine kinase 3 


FLT3 


U02687 


0.6665 


Below 


31 36900_at 


stromal interaction molecule 1 


STIM1 


U52426 


0.6650 


Below 


32 37625_at 


interferon regulatory factor 4 


1RF4 


U52682 


0.6636 


Above 


33 38340_at 


huntingtin interacting protein- 1- 


KIAA0655 


AB014555 


0.6609 


Above 




related 










34 1830_s_at 


transforming growth factor beta 1 


1 CjrrBl 




U.OOUo 


Below 


35 37099_at 


arachidonate 5-lipoxygenase- 


ALOX5AP 


AI806222 


0.6605 


Below 




activating protein 










36 3S254_at 


KIAA08S2 protein 


KIAA0S82 


AB020689 


0.6539 


Below 


37 3764 l_at 


Human gene for hepatitis C- 




D28915 


0.6531 


Below 




associated rrucrotubular aggregate 












T\rr* 1t=mti t-\/i /I pv r\n Q nnrl r*OTnr>1 ftp 
JpiULCJJ.1 yj'-t i -r 9 CAUJUL Z7 uUU L-UIII^JACIC 

cds. 










38 33865 at 


adenovirus 5 El A binding protein 


BS69 


AA127624 


0.6515 


Below 


39 40729 s at 


nuclear factor of kappa light 


NFKBIL1 


Y14768 


0.6502 


Below 




polypeptide gene enhancer in B- 












cells inhibitor-like 1 










40 40113_at 


GS3955 protein 


GS3955 


D87119 


0.6476 


Above 


41 32979_at 


GRB2-associated binding protein 1 


GAB1 


U43885 


0.6457 


Below 


42 3659 l_at 


tubulin alpha 1 testis specific 


TUBA1 


X06956 


0.6427 


Below 


43 38739_at 


v-ets avian erythroblastosis virus 


ETS2 


AF0 17257 


0.6424 


Below 




ri/o oncogene nomoiog z 










44 374S5_at 


fatty-acid-Coenzyme A ligase very 


FACVL1 


D88308 


0.6363 


Above 




long-chain 1 










45 538_at 


CD34 antigen 


CD34 


S53911 


0.6326 


Below 


46 37S93_at 


protein tyrosine phosphatase non- 


PTPN2 


AI828880 


0.6318 


Above 




receptor type 2 










47 41017_at 


myosin-binding protein H 


MYBPH 


U27266 


0.6297 


Above 


48 37967_at 


lymphocyte antigen 117 


LY117 


AF000424 


0.6260 


Below 


49 3728 l_at 


KIAA0233 gene product 


KIAA0233 


D87071 


0.6250 


Below 


50 35675_at 


vinexin beta SH3-containing 


SCAM-1 


AF037261 


0.6229 


Below 




adaptor molecule- 1 












Table 25. Genes selected for Wilkins for Hyperdiploid > 50 






AfTvmptfiv 


Gene Name 


Gene 


Reference Train set 


Above/ 


number 




Symbol 


number 


score 


Below 












Mean 


1 39878_at 


protocadherin 9 


PCDH9 


AI524125 


0.5838 


Below 


2 41470_at 


Prominin mouse like 1 


PROML1 


AF027208 


0.5616 


Above 


3 39069_at 


AE-binding protein 1 


AEBP1 


AF053944 


0.5423 


Below 


4 1520_s_at 


interleukin 1 beta 


IL1B 


X04500 


0.5399 


Above 


5 578_at 


Human recombination acitivating 


RAG2 


M94633 


0.5208 


Below 




protein (RAG2) gene, last exon 










6 3225 l_at 


hypothetical protein FLJ21 174 


FLJ21174 


AA149307 


0.5164 


Above 


7 40480_s_at 


FYN oncogene related to SRC FGR FYN 


M14333 


0.5090 


Above 




YES 










8 38604_at 


neuropeptide Y 


NPY 


AI198311 


0.5083 


Above 
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9 40903_at 

10 38968__at 

11 37272_at 

12 356S8_g_at 

13 14S8_at 

14 36885_at 

15 1630_s_at 

16 38317_at 

17 38649_at 

18 3972 l_at 

19 33307_at 

20 38518_at 

21 39402_at 

22 36489_at 

23 37747_at 

24 40200_at 

25 35940_at 

26 35727_at 

27 1357_at 

28 36592_at 

29 37014_at 

30 4089 l_f__ at 

31 40846_g_at 

32 41132_r_at 

33 37280_at 

34 35939_s_at 

35 890_at 

36 38738_at 

37 38458 at 



PCT/US03/08486 



ATPase H transporting lysosomal 
vacuolar proton punip membrane 
sector associated protein M8-9 

SH3-domain binding protein 5 
BTK-associated 
inositol 1 4 5-trisphosphate 3- 
kinase B 

mature T-cell proliferation 1 
protein tyrosine phosphatase 
receptor type K 
spleen tyrosine kinase 

tyrosine kinase syk 

transcription elongation factor A 
SII like 1 

KIAA0970 protein 
ephrin-B 1 
kraken-like 

sex comb on midleg Drosophila like 
2 

mterleukin 1 beta 
phosphoribosyl pyrophosphate 
synthetase 1 

Human annexin V (ANX5) gene, 
exon 13. 

heat shock transcription factor 1 
POU domain class 4 transcription 
factor 1 

hypothetical protein FLJ20517 
ubiquitin specific protease 4 proto- 
oncogene 
prolnbitin 

myxovirus influenza resistance 1 
homolog of murine interferon- 
inducible protein p78 
DNA segment on chromosome X 
unique 9879 expressed sequence 

mterleukin enhancer binding factor 
3 90Kd 

heterogeneous nuclear 
ribonucleoprotein H2 H 

MAD mothers against 
decapentaplegic Drosophila 
homolog 1 

POU domain class 4 transcription 
factor 1 

ubiquitin-conjugating enzyme E2A UBE2A 
RAD6 homolog 

SMT3 suppressor of mif two 3 SMT3H1 
yeast homolog 1 

Human cytochrome b5 (CYB5) CYB5 
gene, exon 6 and complete cds. 



AJr 1 oivio-y 


AT 049929 


0.5080 


Above 


SH3BP5 


AB005047 


0.5057 


Above 


ITPKB 


X57206 




Below 


MTCP1 


Z24459 


0.5018 


Above 


PTPRK 


L77886 


0.4977 


Below 


SYK 


L2S824 




Below 


syk 


HG3730- 


0.4913 


Below 


HT4000 






TCEAL1 


M99701 


0.4901 


Above 


KIAA0970 


AB023187 


0.4898 


Below 


EFNB1 


U09303 


0.4895 


Above 


BK126B4.1 


AL022316 


0.4880 


Below 


; SCML2 


Y18004 


A A C7Q 

u.4b /V 


Above 


IL1B 


M15330 




Above 




D00860 


0.4718 


Above 


(ANX5 


U05770 


U.4 / 1 / 


Above 


HSF1 


M64673 


0.4689 


Below 


POU4F1 


X64624 


A A/ZQZ 


Above 


FLJ20517 


AI249721 


0.4675 


Below 


USP4 


U20657 


0.4670 


Below 


PHB 


S85655 


0.4668 


Above 


MX'l 


M33882 


n a sin c 


AUr,, m 


DXS9879E 


X92896 


0.4608 


Above 


• ILF3 


U10324 


0.4605 


Below 


HNRPH2 


U01923 


0.4605 


Above 


MADH1 


U59912 


0.4595 


Below 


POU4F1 


L20433 


0.4594 


Above 



M74524 
X99584 

L39945 



0.4570 
0.4568 



Above 
Above 



0.4552 Above 
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38 38869_at 


KIAA1069 protein 


KIAA1069 


AB028992 


0.4549 


Above 


39 915 at 


interferon- induced protein with 


EFIT1 


M24594 


0.4544 


Above 




tetratricopeptide repeats 1 










40 3840S_at 


transmembrane 4 superfamily 


TM4SF2 


LI 0373 


0.4535 


Above 




member 2 








r>eiow 


41 39301_at 


calpain 3 p94 


CAPN3 


XS5030 


U.4j3o 


42 41425_at 


Friend leukemia virus integration 1 


T7T T1 








43 2094_s_at 


v-fos FBJ murine osteosarcoma 


FOS 


K00650 


14 


Above 




viral oncogene homolog 










44 36605_at 


transcription factor 4 


TCF4 


M74719 


0.4497 


Above 


45 37709_at 


DNA segment numerous copies 


DXF68S1E 


M86934 


0.4493 


Above 




expressed probes GS 1 gene 








Above 


4o ^oi/o_ai 


francmpmKrnnp frn 'f"fl r*ln T\ <J nrrttftlTI 
U alloIIlCIIlUl allC U alll^ivillg jpi kjLOi_ii 


TMP21 


L40397 


0.4488 


47 171_at 


von Hippel-Lindau binding protein 
i 


Vt>r 1 






A Vinvp 

AUUVC 


48 41490_at 


i 

phosphoribosyl pyrophosphate 


PRPS2 


Y00971 


0.4466 


Above 




synthetase 2 






0.4448 


Above 


49 36536_at 


schwannomin interacting protein 1 


SCHIP-1 


AF070614 


50 35843_at 


Homo sapiens mRNA cDNA 




L40402 


0.4443 


Above 




DKFZp434D0935 












Table 26. Genes Selected by Wilkins' for MLL 






A ffv nipt ri y 


Gene Name 


Gene 


Reference 


Train set 


Above/ 


number 




Symbol 


number 


score 


Below 












Mean 


1 39402_at 


interleukin 1 beta 


IL1B 


M15330 


0.7355 


Below 


2 3U/_at 


arachidonate 5 -lipoxygenase 


ALOX5 


J03600 


0.7221 


Below 


3 1389_at 


membrane metallo-endopeptidase 


MME 


J03779 


0.7178 


Below 




neutral endopeptidase 












enkephalinase CALLA CD 10 












MAD mothers against 


MADH1 


U59912 


0.7021 


Below 




decapentaplegic Drosophila 












homolog 1 








Below 


S ^6fiS0 at 


cyclin D2 


CCND2 


D13639 


0.6759 


O 3 / VO al 


inhibitor of DNA binding 3 


ID3 


AL021154 


0.6743 


Below 




dominant negative helix-loop-helix 










7 1520_s_at 


protein 

interleukin 1 beta 


IL1B 


X04500 


0.6689 


Below 


o 4Uyli_at 


ATPase Ca transporting plasma 


ATP2B4 


W28589 


0.6684 


Below 




membrane 4 










y JOJjO_at 


schwannomin interacting protein 1 


SCHIP-1 


AF070614 


0.6554 


Below 


10 3739S_at 


platelet/endothelial cell adhesion 


PEC AMI 


AA100961 


yJ.ODHO 


Below 




molecule CD3 1 antigen 








Below 


1 1 ^91 14 at 

xi i i*t at 


decidual protein induced by 


DEPP 


AB022718 


0.6478 




progesterone 






0.6432 


Below 


1? "}7Q67 at 


lymphocyte antigen 117 


LY117 


AF000424 


1 — » JIJ-J til 


MAD mothers against 


MADH1 


U59423 


0.6421 


Below 




decapentaplegic Drosophila 












homolog 1 






0.6395 


Below 


14 3S336_at 


KIAA1013 protein 


KIAA1013 


AB023230 


15 577_at 


rrudkine neurite growth-promoting 


MDK 


M94250 


0.6363 


Below 



factor 2 
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16 38671_at 

17 33412_at 

18 4045 l_at 

19 36908_at 

20 963_at 

21 41346_at 

22 32207_at 

23 2062_at 

24 38408_at 

25 854_at 

26 32193_at 

27 35939_s_at 

28 33705_at 

29 34168_at 

30 36383_at 

31 38968_at 

32 39263_at 

33 39329_at 

34 34699__at 

35 1267_at 

36 35172_at 

37 38124_at 

38 33813_at 

39 34176_at 

40 39424_at 

41 40729_s_at 

42 32607_at 

43 38342_at 

44 32533 s at 



KIAA0620 protein 
LGALS1 Lectin, galactoside- 
binding, soluble, 1 

hypothetical protein FLJ21434 

Human macrophage mannose 
receptor (MRC1) gene, exon 30. 

ligase IV DNA ATP-dependent 
like-glycosyltransferase 

membrane protein palmitoylated 1 
55kD 

insulin-like growth factor binding 
protein 7 

transmembrane 4 superfamily 
member 2 

B lymphoid tyrosine kinase 
plexin CI 

POU domain class 4 transcription 
factor 1 

phosphodiesterase 4B cAMP- 
specific dunce Drosophila homolog 
phosphodiesterase E4 
deoxynucleotidyltransferase 
terminal 

v-ets avian erythroblastosis virus 
E26 oncogene related 

SH3-domain binding protein 5 
BTK-associated 

2 5 oligoadenylate synthetase 2 
actrnin alpha 1 
CD2-associated protein 
protein kinase C eta 
tyrosylprotein sulfotransferase 2 
midkine neurite growth-promoting 
factor 2 

tumor necrosis factor receptor 
superfamily member IB 

hypothetical protein from clone 643 

tumor necrosis factor receptor 

superfamily member 14 herpesvirus 

entry mediator 

nuclear factor of kappa light 

polypeptide gene enhancer in B- 

cells inhibitor-like 1 

brain acid-soluble protein 1 

KIAA0239 protein 

vesicle-associated membrane 
protein 5 myobrevin 



KIAA0620 
LGALS1 


AB014520 
AI535946 


0.6353 
0.6351 


Below 
Above 


FLJ21434 
MRC1 


AL080203 
M93221 


0.6350 
0.6290 


Below 
Below 


JUJLVJH 

LARGE 


X83441 
AJ007583 


0.6282 
0.6214 


Below 
Below 


MPP1 


M64925 


0.6155 


Below 


IGFBP7 


L19182 


0.6145 


Above 


TM4SF2 


L10373 


0.6137 


Below 


BLK 
POU4F1 


S76617 

AF030339 

L20433 


0.6075 
0.6065 
0.6046 


Above 
Above 
Below 


PDE4B 


L20971 


0.5991 


Below 


DNTT 


Ml 1722 


0.5979 


Below 


ERG 


M17254 


0.5976 


Below 



SH3BP5 

OAS2 

ACTN1 

CD2AP 

PRKCH 

TPST2 

MDK 



AB005047 

M87434 

XI 5804 

AL050105 

M55284 

AF049891 

X55110 



TNFRSF1B AI8 13532 



LOC5722S 
TNFRSF14 



AF091087 
U70321 



NFKBIL1 Y14768 



45 39330_s__at actinin alpha 1 



BASP1 

KIAA0239 

VAMP5 

ACTN1 



AF039656 

D87076 

AF054825 

M95178 



0.5976 Below 

0.5967 Below 

0.5953 Below 

\J.J7tJ JJ^lO >V 

0.5941 Below 

0.5937 Below 

0.5936 Below 

0.5934 Below 

0.5930 Below 

0.5930 Below 

0.5905 Below 

0.5905 Above 

0.5896 Below 

0.5880 Below 

0.5867 Below 
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BNyuuuiu 1 <wu 



WO 03/083140 

46 40519_at 

47 39338_at 

48 35940_at 

49 39712_at 

50 39379 at 



protein tyrosine phosphatase PTPRC Y00638 

receptor type C 

S100 calcium-binding protein A10 S100A10 AI201310 
annexin II ligand calpactin I light 
polypeptide pll 

POU domain class 4 transcription POU4F 1 X64624 
factor 1 

S100 calcium-binding protein A13 S100A13 AI541308 
Homo sapiens mRNA cDNA AL049397 
DKFZp586C1019 from clone 
DKFZp586C1019 



PCT/US03/08486 

0.5848 Above 
0.5844 Above 

0.5824 Below 

0.5818 Below 
0.5811 Above 



Affymetrix 
number 

1 31892 at 



Table 27: Genes Selected by Wilkins' for Novel Risk Group 
Gene Name 



41734_at 
995_g_at 



4 994_at 

5 37967_at 

6 34676_at 

7 41159_at 

8 39728_at 

9 37542_at 

10 35350_at 

11 41438_at 

12 34370_at 

13 36029_at 

14 37960_at 

15 35869_at 

16 36601_at 

17 40775_at 

18 3728 l_at 

19 957_at 

20 33284_at 

21 40585_at 

22 37908_at 

23 40167_s_at 

24 3S576_at 

25 36591 at 



protein tyrosine phosphatase 
receptor type M 
KIAA0870 protein 
protein tyrosine phosphatase 
receptor type M 
protein tyrosine phosphatase 
receptor type M 
lymphocyte antigen 117 

KIAA1099 protein 

Clathrin heavy polypeptide He 

interferon gamma-inducible protein 

30 

lipoma HMGIC fusion partner-like 
2 

B cell RAG associated protein 
KIAA1451 protein 
Archain 1 

chromosome 1 1 open reading frame 
8 

carbohydrate chondroitin 6/keratan 
sulfotransferase 2 
MD-1 RP105-associated 

Vinculin 

Integral membrane protein 2A 
KIAA0233 gene product 
Arrestin, beta 2 

myeloperoxidase 

adenylate cyclase 7 

guanine nucleotide binding protein 

11 

CS box-containing WD protein 
H2B histone family member B 
tubulin alpha 1 testis specific 



Gene 


Reference 


Train set 


Above/ 


Symbol 


number 


score 


Below 






Mean 


PTPRM 


X58288 


0.8668 


Above 


KIAA0870 


AB020677 


0.8614 


Below 


PTPRM 


X58288 


0.8505 


Above 


PTPRM 


X58288 


0.7694 


Above 


LY117 


AF000424 


0.7399 


Below 


KIAA1099 


AB029022 


0.7298 


Above 


CLTC 


D21260 


0.7283 


Above 


TFT10 


J03909 


0.7138 


Below 


LHFPL2 


D86961 


0.7069 


Above 


BRAG 


AB011170 


0.7049 


Below 


KIAA1451 


AL049923 


0.6999 


Below 


ARCN1 


X8U98 


0.6999 


Below 


C110RF8 


U57911- 


0.6964 


Above 


CHST2 


AB014679 


0.6947 


Above 


MD-1 


AB020499 


0.6908 


Below 


VCL 


M33308 


0.6908 


Below 


ITM2A 


AL021786 


0.6879 


Above 


KIAA0233 


D87071 


0.6837 


Below 


ARRB2 


HG2059- 


0.6744 


Below 




HT2114 






MPO 


M19507 


0.6712 


Below 


ADCY7 


D25538 


0.6712 


Below 


GNG11 


U31384 


0.6656 


Above 


LOC55884 


AF038187 


0.6581 


Below 


H2BFB 


AJ223353 


0.6576 


Below 


TUBA1 


X06956 


0.6576 


Below 
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26 37712_g_at 

27 33924_at 

28 32724_at 

29 33358_at 

30 33740_at 

31 36588_at 

32 3S802_at 

33 3S408_at 

34 32227_at 

35 34840_at 

36 1131_at 

37 33410_at 

38 38006_at 

39 33907_at 

40 41273_at 

41 39781_at 

42 39S93_at 

43 37326_at 

44 36687_at 

45 40423_at 

46 32542_at 

47 33232_at 

48 37280_at 

49 1325_at 

50 40729_s_at 



factor 2 polypeptide C myocyte 
enhancer factor 2C 
KIAA1091 protein 

phytanoyl-CoA hydroxylase 
Refsum disease 
EST (retina) 

chromosome 1 open reading frame 
2 

KIAA08 10 protein 

progesterone binding protein 

transmembrane 4 superfamily 
member 2 

proteoglycan 1 secretory granule 

Homo sapiens cDNA FLJ22642 fis 
clone HSI06970 
mitogen-activated protein kinase 
kinase 2 
integrin alpha 6 

CD48 antigen B-cell membrane 
protein 

eukaryotic translation initiation 
factor 4 gamma 3 

FK506 binding protein 12- 
rapamycin associated protein 1 

insulin-like growth factor-binding 
protein 4 

guanine nucleotide binding protein 
G protein gamma 7 

proteolipid protein 2 colonic 
epithelium-enriched 

cytochrome c oxidase subunit Vllb 

KIAA0903 protein 

four and a half LIM domains 1 

cysteine-rich protein 1 intestinal 

MAD mothers against 
decapentaplegic Drosophila 
homolog 1 

MAD mothers against 
decapentaplegic Drosophila 
homolog 1 

nuclear factor of kappa light 
polypeptide gene enhancer in B- 
cells inhibitor-like 1 



MEF2C 


S57212 


0.6576 


Below 


KIAA1091 


AB029014 


0.6484 


Below 


PHYH 


AF023462 


0.6466 


Above 




W29087 


0.6457 


Above 


C10RF2 


AF023268 


0.6441 


Below 


KIAA0810 


AB018353 


0.6441 


Below 


HPR6.6 


Y12711 


0.6441 


Below 


TM4SF2 


L10373 


0.6440 


Below 


PRG1 


X17042 


0.6409 


Below 






0.6409 


Below 


MAP2K2 


L11285 


0.6409 


Below 


HOAo 


OOOZ X -> 


0.6391 


Above 


CD48 


M37766 


0.6342 


Below 


EIF4G3 


AF0 12072 


0.6304 


Below 


FRAP1 


AL046940 


0.6304 


Below 


IGFBP4 


U20982 


0.6301 


Below 


GNG7 


AB010414 


0.6301 


Below 


T»T TIO 

PLP2 




0.6267 


Below 


COX7B 


JN jUDZU 


0 6766 


Below 


KIAA0903 


AB020710 


0.6254 


Above 


FHL1 


AF063002 


0.6236 


Below 


CRIP1 


AI017574 


0.6211 


Below 


MADH1 


U59912 


0.62Uo 


Above 


MADH1 


U59423 


0.6208 


Above 


NFKBIL1 


Y14768 


0.6199 


Below 
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Affymetrix 
number 

1 38242_at 

2 37988_at 

3 1096_g_at 

4 3931S_at 

5 3801S__g_at 

6 36878_f_at 

7 38147 at 



S 
9 



35350__at 
38051 at 



10 266_s_at 

11 38521_at 

12 37344_at 

13 34033_s_at 

14 36638_at 

15 38213_at 

16 41734_at 

17 37711_at 

18 36239_at 

19 38319_at 

20 38894_g_at 

21 33705_at 

22 38017_at 

23 41156_g_at 

24 38994_at 

25 37710_at 

26 41155 at 



Table 28- Genes selected by Wilkins* for T-ALL 

Gene Name Gene Reference 

Symbol number 

B cell linker protein SLP65 AF068 1 SO 

CD79B antigen immunoglobulin- CD79B M89957 
associated beta 

CD19 antigen CD19 M28170 

T-cell leukeniia/lymphoma 1A TCL1A X82240 

CD79A antigen immunoglobulin- CD79A U05259 
associated alpha 

major histocompatibility complex HLA-DQB 1 M60028 
class II DQ beta 1 

SH2 domain protein 1 A Duncan s SH2D1 A AL023657 

disease lymphoproliferative 

syndrome 

B cell RAG associated protein BRAG ABO 11170 

mal T-cell differentiation protein MAL X76220 

CD24 antigen small cell lung CD24 L33930 

carcinoma cluster 4 antigen 

CD22 antigen CD22 X59350 

major histocompatibility complex HLA-DMA X62744 
class II DM alpha 

leukocyte immunoglobulin-like LILRA2 AF025531 
receptor subfamily A with TM 
domain member 2 

connective tissue growth factor CTGF X78947 

galactosidase alpha GLA U78027 

KIAA0870 protein KIAA0870 AB020677 

MADS box transcription enhancer MEF2C S572 1 2 
factor 2 polypeptide C myocyte 
enhancer factor 2C 

POU domain class 2 associating POU2AF 1 Z49 1 94 
factor 1 

CD3D antigen delta polypeptide CD3D AA9 1 9 1 02 

TiT3 complex 

neutrophil cytosolic factor 4 40kD NCF4 AL008637 

phosphodiesterase 4B cAMP- PDE4B L20971 

specific dunce Drosophila homolog i 
phosphodiesterase E4 

CD79A antigen immunoglobulin- CD79A U05259 
associated alpha 

catenin cadherin-associated protein CTNNA 1 U03 100 
alpha 1 102kD 

STAT induced STAT inhibitor^ STATI2 AF037989 

MADS box transcription enhancer MEF2C L08895 
factor 2 polypeptide C myocyte 
enhancer factor 2C 
catenin cadherin-associated protein 
alpha 1 102kD 



Train set 
score 



CTNNA1 U03100 



Above/ 
Below 
Mean 

0.86S3 Below 

0.S422 Below 



0.8181 Below 
0.8128 Below 
0.8127 Below 



0.8053 Below 

0.8016 Above 

0.7914 Below 

0.7900 Above 

0.7867 Below 

0.7856 Below 

0.7835 Below 

0.7761 Below 

0.7755 Below 

0.7701 Below 

0.7693 Below 

0.7560 Below 

0.7440 Below 

0.7426 Above 

0.7422 Below 

0.7414 Below 

0.7360 Below 

0.7315 Below 

0.7292 Below 

0.7283 Below 

0.7278 Below 
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27 40570_at 


forkhead box Ol A 


FOX01A 


AF032885 


0.7258 


Below 




rhab domyosarconia 








Below 


28 34224__at 


fatty acid desaturase 3 


FADS3 


AC004770 


0,7254 


29 38604_at 


neuropeptide Y 


NPY 


AI198311 


0.7212 


Below 


30 36773_f_at 


major histocompatibility complex 


HLA-DQB1 


M81141 


0.7197 


Below 




class II DQ beta 1 












CIlLltJ ti.JL.LJU. v_/i>JLd XVt'XIvlU VV t/L't'l 


ENG 


X72012 


0.7180 


Below 




syndrome 1 










32 36502_at 


PFTAIRE protein kinase 1 


PFTK1 


AB020641 


0.7179 


Below 


33 37180_at 


phospholipase C gamma 2 


PLCG2 


X14034 


0.7114 


Below 




phosphatidylinositol-specific 










34 38893_at 


neutrophil cytosolic factor 4 40kD 


NCF4 


AL008637 


0.7100 


Below 


35 387_at 


cyclin-dependent kinase 9 CDC2- 


CDK9 


X80230 


0.7024 


Below 




related kinase 










36 32035_at 


Human MHC class II HLA- 




Ml 6942 


0.6992 


Below 




T")R wS^-;w^Ap"iatf*f1 o\\jcc\yitc\\p\w 












beta- chain mRNA complete cds 










37 41153_f_at 


Homo sapiens alpnaE-catenin 


C1NJMA1 


A T? 1 AO OAO 


U.oy /o 


Below 




(CTNNA1) gene 










38 40780_at 


C-terminal binding protein 2 


CTBP2 


A T?A 1 /Tf A7 

AF0 16507 


O.o97o 


Below 


39 40775_at 


•integral membrane protein 2A 


ITM2A 


a t ao i ao/C 


A /COCO 


Above 


40 39402_at 


interleukin 1 beta 


IL1B 


M15330 


0.6945 


Below 


41 38522_s_at 


CD22 antigen 


CD22 


X52785 


0.6945 


Below 


42 41166_at 


immunoglobulin heavy constant mu 


IGHM 


X58529 


0.6941 


Below 


43 36937_s_at 


PDZ and LEVI domain 1 elfin 


PDLIM1 


U90878 


0.6937 


Below 


AA OCO« «f 


Miim^M ml? TvJ A fr»r nl^ccTT 
XlUIIlall IIIIVINTA. JLUI oJ3 tlaisoli 




X00457 


0.6925 


Below 




histocompatibility antigen alpha- 












chain 










45 2047_s__at 


junction plakoglobin 


JUP 


M23410 


0.6920 


Below 


46 36277_at 


Human membran protem (CD3- 


CD3E 


M23323 


A /T O A A 

0.6899 


Above 




ensilon^ ?ene exon 9 










47 40688_at 


linker for activation of T cells 


LAT 


AJ2232S0 


0.6898 


Above 


48 39389_at 


CD9 antigen p24 


CD9 


M38690 


0.6879 


Below 


49 33162_at 


Insulin receptor 


INSR 


X02160 


0.6879 


Below 


50 31891 at 


chitinase 3-like 2 


CHI3L2 


U58515 


0.6872 


Above 



Table 29. Genes Selected by Wilkins' for TEL-AML1 



Affymetrix 
number 

1 37780_at 

2 38203 at 



Gene Name 



Piccolo presynaptic cytomatrix 
protein 

potassium intermediate/small 
conductance calcium-activated 
channel subfamily N member 1 



Gene Reference Train set Above/ 
Symbol number score Below 

Mean 

0.7121 Above 



PCLO 
KCNN1 



AB011131 
U69883 



0.7086 Above 
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3 36524_at 


Rho guanine nucleotide exchange 
factor GEF 4 


AKJriOr!Jr4 




0 67R9 


Above 


4 38578_at 


tumor necrosis factor receptor 


TTvTCt? C C"7 

UNrKor / 




0.6718 


Above 




superfaniily member 7 










5 32730_at 


Homo sapiens mRNA for KIAA1750 






u.ooio 


AUU V \s 




protein partial cds 










6 34194_at 


tt ^ _ r ,„„; Qnr /VTYKTA TTT T91AQ7 fie 

Homo sapiens cuina ri-jzioy / ns 




AL049313 


0.6518 


Above 




clone COL09740 










7 40272_at 


collapsin response mediator protein 1 


CRMP1 


D78012 


0.6160 


Above 


8 41819_at 


FYN-binding protein FYB-120/130 


FYB 


U93049 


0.6058 


Above 


9 14S8__at 


protein tyrosine phosphatase receptor 


PTPRK 


L77886 


0.6056 


Above 




type K 






0.6022 


Above 


10 35665_at 


phosphoinositide-3 -kinase class 3 


PIK3C3 




11 35614_at 


transcription factor-like 5 basic helix- 


TCFL5 


AB012124 




Above 




loop-helix 










12 36008_at 


protein tyrosine phosphatase type IVA PTP4A3 


AF041434 


0.5976 


Above 




member 3 










13 35362_at 


Myosin X 


MYO10 


A Dfll C^Zl9 


0 

\J . u ^ 




14 37908_at 


guanine nucleotide binding protein 1 1 


GNG11 


T TO 1 1 O A 

U31384 


U.JOOO 


Above 


15 39329_at 


Actinin alpha 1 


ACTN1 






Below 


16 1936_s_at 


proto-oncogene c-myc, alt. transcript 




tlKJDDZD- 




DCIU W 




3, ORF114 




HT4899 






17 33690_at 


Homo sapiens mRNA cDNA 


DKFZp434 


AL080190 


0.5725 


Above 




DKFZp434A202 


A202 








18 39389_at 


CD9 antigen p24 


CD9 


M3S690 


0.5684 


Below 


19 37343_at 


inositol 1 4 5 -triphosphate receptor 


ITPR3 


U01062 


0.5642 


Above 




type 3 






0.5585 


Above 


20 1299_at 


telomeric repeat binding factor 2 


TERF2 


X93512 


21 38652_at 


hypothetical protein FLJ20154 


FLJ20154 


AF070644 


0.5563 


Above 


22 38763_at 


(clone D2 1-1) L-iditol-2 




L29254 


0.5535 


Below 




dehydrogenase gene 






U.JJUU 


DtlU VV 


23 37724_at 


v-myc avian myelocytomatosis viral 
oncogene homolog 


MYC 


VUUDOo 


24 36937_s_at 


PDZ and LEVI domain 1 elfin 


PDLIM1 


U90878 


0.5506 


Below 


9^ 1^9S at 


MAD mothers against 


MADH1 


U59423 


0.5482 


Above 




decapentaplegic Drosophila homolog 
1 

adaptor-related protein complex 1 
sigma 2 subunit 










26 41549_s_at 


AP1S2 


AFO.Q1 077 


0.5474 


Below 


T7 'JOQ07 of 

2.1 5yoZt_jXl 


hypothetical protein 


FLJ20500 


AA522530 


0.5471 


Below 


28 32724_at 


phytanoyl-CoA hydroxylase Refsum 


PHYH 


AF023462 


0.5459 


Above 




disease 








Above 


29 31786_at 


Sam68-like phosphotyrosine protein 


T-STAR 


AF051321 


0.5403 




T-STAR 








Above 


30 38570_at 


major histocompatibility complex 
class II DO beta 


HLA-DOB 


X03066 


0.5384 


31 39330_s_at 


actinin alpha 1 


ACTN1 


M95178 


0.5375 


Below 
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32 36493_at 

33 574_s_at 

34 32224_at 

35 1077_at 

36 37280_at 

37 41200_at 

38 36009_at 

39 36933_at 

40 1126_s_at 

41 39824_at 

42 3807S_at 

43 38127_at 

44 32941_at 

45 37276__at 

46 34768_at 

47 3978 l_at 

48 37918 at 



49 41490_at 

50 41814 at 



lymphocyte-specific protein 1 

caspase 1 apoptosis-related cysteine 
protease interleukin 1 beta convertase 

KIAA0769 gene product 
recombination activating gene 1 

MAD mothers against 
decapentaplegic Drosophila homolog 
1 

CD36 antigen collagen type I receptor 
thrombospondin receptor like 1 

hypothetical protein 

N-myc downstream regulated 

Human cell surface glycoprotein 
CD44 (CD44) gene, 3* end of long 
tailed isoform. 
ESTs 

filamin B beta actin-binding protein- 
278 

syndecan 1 

interferon consensus sequence 
binding protein 1 

IQ motif containing GTPase 
activating protein 2 

DKFZP564E1962 protein 

insulin-like growth factor-binding 
protein 4 

integrin beta 2 antigen CD1 8 p95 
lymphocyte function-associated 
antigen 1 macrophage antigen 1 mac- 
1 beta.subunit 

phosphoribosyl pyrophosphate 
synthetase 2 

fucosidase alpha-L- 1 tissue 



LSP1 
CASP1 



M33552 
M87507 



KIAA0769 AB018312 
RAG1 M29474 
MADH1 U59912 



CD36L1 Z22555 



CL683 

NDRG1 

CD44 



FLNB 

SDC1 
ICSBP1 



AF091092 

D87953 

L05424 

AI391564 
AF0421 66 

Z48199 
M91196 



IQGAP2 U51903 

DKFZP564 AL080O8O 
E1962 



IGFBP4 
ITGB2 

PRPS2 
FUCA1 



U20982 
M15395 

Y00971 
M29877 



0.5356 Below 

0.5336 Below 

0.5326 Above 

0.5302 Above 

0.5283 Above 

0.5261 Above 

0.5259 Below 

0.5254 Below 

0.5232 Below 

0.5231 Above 

0.5208 Below 

0.5199 Above 

0.5195 Below 

0.5191 Below 

0.5184 Below 

0.5173 Below 

0.5162 Below 



0.5155 Below 
0.5101 Above 



5. SOM/DAV 

The 10,991 probe sets that passed the variation filter were used for subsequent 
selection of discriminating genes using the self-organizing map (SOM) and 

5 discriminant analysis with variance (DAV) programs in the GeneMaths software 

package (version 1.5, Applied Maths, Belgium). The subgroups for which genes were 
selected included T-lineage ALL, TEL-AMU, E2A-PBX1, MLL rearrangement, BCR- 
ABL, hyperdiploid ALL (chromosomal number > 50) and the novel subgroup 
described in the text of the paper. The target number of total genes chosen by each 

10 algorithm was 500. 
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1 The SOM analysis was performed using 30 X 18 node format to enable an 

optimal number of genes per node (-20 genes per node). Nodes that contained genes 
whose expression varied more than 2-fold from the mean in more than 70% of the 
samples in a particular subgroup were chosen. A total of 451 genes were chosen 
5 using the SOM algorithm and 443 genes using the DAV algorithm. The combined 
gene sets contained 755 unique genes, of which 185 were present in both subsets. 2-D 
hierarchical clustering of the genes and samples were performed using Pearson's 
correlation coefficient as the metric and unweighted pair group method using 
arithmetic averages (UPGMA). Approximately 10% of the genes that were found to 

10 have correlation coefficients less than 0.7 in each branch of the dendrogram were 

removed and the process was repeated reiteratively until the correlation coefficient for 
all genes within a branch was > 0.7, or until the removal of additional gene resulted in 
a deterioration of the class distinction as indicated by inappropriate clustering of 
cases. Through this approach a subset of 215 genes were selected that optimally 

15 separated the 7 subgroups. These genes are listed in Tables 30-36. The selection of 
genes by this approach does not provide for a ranking. For class prediction between 
20 and 30 genes were used for each genetic subgroup, unless otherwise stated. 



Table 30. Genes selected by DAV-SOM for BCR-ABL 





Affymetrix 
number 


Gene Name 


GeneSymbol 


Reference 
number 


Above/ 
Below 
Mean 


1 


39250_at 


nephroblastoma overexpressed gene 


NOV 


X96584 


Above 


2 


37600_at 


extracellular matrix protein 1 


ECM1 


U68186 


Above 


3 


38312_at 


DKFZp5640222 from clone 
DKFZp5640222 




AL050002 


Above 


4 


38342_at 


KIAA0239 protein 


KIAA0239 


D87076 


Above 


5 


39712_at 


SI 00 calcium-binding protein A 13 


S100A13 


AI541308 


Above 


6 


39730_at 


v-abl Abelson murine leukemia viral 
oncogene homolog 1 


ABL1 


X16416 


Above 


7 


3978 l_at 


Insulin-like growth factor-binding protein 
4 


IGFBP4 


U20982 


Above 


8 


4005 l_at 


TRAM-like protein 


KIAA0057 


D31762 


Above 


9 


40504_at 


paraoxonase 2 


PON2 


AF001601 


Above 


10 


33362_at 


Cdc42 effector protein 3 


CEP3 


AF094521 


Above 


11 


33404_at 


adenylyl cyclase-associated protein 2 


CAP2 


U02390 


Above 


12 


34362_at 


solute carrier family 2 facilitated glucose 
transporter member 5 


SLC2A5 


M55531 


Above 


13 


36591_at 


Tubulin alpha 1 testis specific 


TUBA1 


X06956 


Above 
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1 4 3 8077_at collagen type VI alpha 3 COL6 A3 

15 40196_at HYA22 protein HYA22 

16 191 l_s_at Growth arrest and DNA-damage- GADD45 A 

inducible alpha 

17 1702_at interleukin 2 receptor alpha IL2RA 

18 1635_at Human proto-oncogene tyrosine-protein ABL 

kinase (ABL) gene, exon la and exons 2- 
10, complete cds. 

1 9 1 63 6_g_at Human proto-oncogene tyrosine-protein ABL 

kinase (ABL) gene, exon la and exons 2- 
10, complete cds. 

20 1326_at Caspase 10 apoptosis-related cysteine CASP10 

protease 

21 330_s_at Tubulin, alpha 1, isoform44 TUBA1 



X52022 
D88153 
M60974 

X01057 
U07563 



U07563 



U60519 

HG2259- 
HT2348 



Above 
Above 
Above 

Above 
Above 



Above 

Above 
Above 



Table 31. Genes selected by DAV-SOM for E2A-PBX1 



Affymetrix 
number 


Gene Name 


GeneSymbol 


Reference 
nuniDer 


Above/ 

X9 LIU V> 








Mean 


1 33513_at 


signaling lymphocytic activation molecule 


SLAM 


U33017 


Above 


2 37479 at 


CD72 antigen 


CD72 


M54992 


Above 


3 37485_at 


fatty-acid-Coenzyme A ligase very long- 
chain 1 


FACVL1 


D88308 


Above 


4 39614 at 


KIAA0802 protein 


KIAA0802 


AB018345 


Above 


3 3yyzy_at 


KIAA0922 protein 


KIAA0922 


AB023139 


Above 


6 40648_at 


c-mer proto-oncogene tyrosine kinase 


MERTK 


U08023 


Above 


7 41017_at 


Myosin-binding protein H 


MYBPH 


U27266 


Above 


8 41425__at 


Friend leukemia virus integration 1 


FETl 


M98833 


Above 


9 41862_at 


K1AA0056 protein 


KIAA0056 


D29954 


Above 


10 32063_at 


pre-B-cell leukemia transcription factor 1 


PBX1 


M86546 


Above 


11 37225_at 


KIAA0172 protein 


KIAA0172 


D79994 


Above 


12 38285_at 


mu-crystallin gene 




AF039397 


Above 


13 38286_at 


KIAA1071 protein 


KIAA1071 


AB028994 


Above 


14 38340__at 


huntingtin interacting protein- 1 -related 


KIAA0655 


AB014555 


Above 


15 39379_at 


cDNA DKFZp586C1019 from clone 
DKFZp586C1019 




AL049397 


Above 


16 39402_at 


interleukin 1 beta 


IL1B 


M15330 


Above 


17 40454_at 


FAT tumor suppressor Drosophila homolog FAT 


X87241 


Above 


18 41139_at 


melanoma antigen family D 1 


MAGED1 


W26633 


Above 


19 41146_at 


ADP-ribosyltransferase NAD poly ADP- 
ribose polymerase 


ADPRT 


J03473 


Above 


20 33355_at 


Homo sapiens cDNA FLJ12900 fis clone 




AL049381 


Above 




NT2RP2004321 








21 34783_s_at 


BUB3 budding uriinhibited by 


BUB3 


AF047473 


Above 




benzimidazoles 3 yeast homolog 
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22 36179_at 


rnitogen-activated protein kinase-activated 
protein kinase 2 


MAPKAPK2 


T T1 O T7f\ 

U12779 


Above 


23 365S9_at 


aldo-keto reductase family 1 member B 1 
aldose reductase 


AKR1B1 


X15414 


Above 


9/1 ^R^O^ at 

~H JQJ7J CLI 


TlTA AH947 rrp»np» nrnrlnrt 


KTA A0947 


D87434 


Above 




Nuclear factor of kappa light polypeptide 
gene enhancer in B-cells 1 pi 05 


iNJriv_t> 1 




Above 


26 I786 at 


c-mer proto-oncogene tyrosine kinase 


MERTK 


U0S023 


Above 


97 1 ^90 c at 
Z/ 1DZU S> al 


miericU-KjJi i oeid 


TT 1R 




AV^fwp 


9Q 19£7 it 


/vLyjr-riDosyiuaiisierase in^v-L' poiy j\ur- 
ribose polymerase 


AnpPT 

/VJL/Jrxvl 




Above 


29 854_at 


B lymphoid tyrosine kinase 


BLK 


SI 6617 


Above 


30 753_at 


Nidogen 2 


NID2 


D86425 


Above 


j I £ *j\J_al 


nucleoside phosphorylase 


NP 


X00737 


Above 


32 362_at 


Protein kinase C zeta 


PRKCZ 


Z15108 


Above 




Table 32. Genes selected by DAV/SOM for Hyperdiploid >50 




Affymetrix 
number 

1 JU/7J d.1 


Gene Name 

prosaposin variant Gaucher disease and 
variant metachromatic leukodystrophy 


GeneSymbol 

PSAP 


Reference 
number 

J03077 


Above/ 
Below 
Mean 

/YDOVd 


9 1C9A9 at 


B cell linker protein 


SLP65 


AF068180 


Above 


o Jojio_at 


sex comb on midleg Drosophila like 2 


SCML2 


Y 18004 


Above 


4 3yoZo_at 


RAB9 member RAS oncogene family 


RAB9 


U44103 


Above 


3 jIQOj ax 


KIAA0179 protein 


KIAA0179 


D80001 


Above 


6 33228_g_at 


interleukin 10 receptor beta 


IL10RB 


AI9S4234 


Above 


7 33753_at 


KIAA0666 protein 


KIAA0666 


AB014566 


Above 


o 3 /D*t3 ai 


Rac/Cdc42 guanine exchange factor GEF 6 ARHGEF6 


D25304 


Above 


O ICO^R of 


SH3-domain binding protein 5 BTK- 
associated 


SH3BP5 


AB005047 


Above 




CGI-76 protein 


LOC51632 


AI557497 


Above 


1 1 39329_at 


Actinin alpha 1 


ACTN1 


X15804 


Above 


12 39389_at 


CD9 antigen p24 


CD9 


M38690 


Above 


1 1 Q70A7 of 

ij JZzu/_at 


membrane protein palmitoylated 1 55kD 


MPP1 


M64925 


Above 


1 4 lOOIA of 


ubiquitin-conjugating enzyme E2G 2 
homologous to yeast UBC7 


UBE2G2 


AF032456 


Above 


15 3225 l_at 


hypothetical protein FLJ21 174 


FLJ21174 


AA149307 


Above 


i z: n ^T/C/l of 

io j j /o4_at 


chromosome X open reading frame 5 


OFD1 


Y15164 


Above 


17 3662Q_at 


superoxide dismutase 1 soluble 
amyotrophic lateral sclerosis 1 adult 


SOD1 


X02317 


Above 


18 36937_s__at 


PDZ and LEM domain 1 elfin 


PDLIM1 


U90878 


Above 


19 37326_at 


proteolipid protein 2 colonic epithelium- 
enriched 


PLP2 


U93305 


Above 



-92- 



030831 40A2_I_> 



WO 03/083140 



20 37350 at 



21 3873S_at 

22 39168_at 

23 40903 at 



24 32572_at 

25 1065_at 

26 306 s at 



clone 889N15 on chromosome Xq22.1- PSMD10 
22.3. Contains part of the gene for a novel 
protein similar to X. laevis Cortical 
Thymocyte Marker CTX 

SMT3 suppressor of mif two 3 yeast SMT3H1 
homolog 1 

Ac-like transposable element ALTE 
ATPase H transporting lysosomal vacuolar APT6M8-9 
proton pump membrane sector associated 
protein MS-9 

ubiquitin specific protease 9 X chromosome USP9X 
Drosophila fat facets related 



fins-related tyrosine kinase 3 

high-mobility group nonhistone 
chromosomal protein 14 



FLT3 
HMG14 



PCT/US03/08486 

AL031177 Above 



X99584 

AB018328 
AL049929 



X98296 

U02687 
J02621 



Above 

Above 
Above 



Above 

Above 
Above 



Table 33: Genes selected by DAV/SOM for MLL 



Affymetrix 
number 

1 31492_at 

2 36777_at 

3 39301_at 

4 41448_at 

5 39424 at 



7 40493 at 



Gene Name 

Muscle specific gene 

DNA segment on chromosome 12 unique 

2489 expressed sequence 

Calpain 3 p94 
Homeo box A4 



GeneSymbol 

M9 

D12S2489E 

CAPN3 
HOXA4 



tumor necrosis factor receptor superfamily TNFRSF 1 4 
member 14 herpesvirus entry mediator 



Tumor -r»rotein.D52-like 2 



TPD52L2 

Human cell surface glycoprotein CD44 CD44 
(CD44) gene, 3' end of long tailed isoform. 



8 40506_s_at Homo sapiens polyadenylate binding 
protein mRNA, complete cds. 



9 40514_at 

10 40763_at 

11 40797_at 

12 40798_s_at 

13 41747_s_at 

14 32193_at 

15 32215J_at 

16 33412_at 

17 34306_at 

18 34785 at 



hypothetical 43.2 Kd protein LOC51614 
Meis 1 mouse homolog MEIS 1 

a disintegrin and metalloproteinase domain AD AM 10 



10 

a disintegrin and metalloproteinase domain ADAM 10 
10 

myocyte-specific enhancer factor 2A 
(MEF2A) gene 



MEF2A 



Plexin CI 

KIAA0878 protein 

LGALS1 Lectin, galactoside-binding, 

soluble, 1 (galectin 1) 

muscleblind Drosophila like 
KIAA1025 protein 



PLXNC1 

KIAA0878 

LGALS1 

MBNL 
KIAA1025 



Reference 
number 

AB019392 
AJ001687 

X85030 

AC004080 

U70321 

AF004430 
L05424 

U75686 

AF091085 

U85707 

AF009615 

Z48579 

U49020 

AF030339 
AB020685 
AI535946 

AB007888 
AB028948 



Above/ 
Below 
Mean 

Above 

Above 

Below 
Above 
Below 

Above 
Above 

Above 

Above 
Above 
Above 

Above 

Above 

Above 
Above 
Above 

Above 
Above 
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10 3529S at 


eukaryotic translation initiation factor 3 


EIF3S7 


U54558 


Above 




subunit 7 zeta 66/67kD 








20 36690_at 


Nuclear receptor subfamily 3 group C 


NR3C1 


M10901 


Above 




member 1 








21 37675_at 


solute carrier family 25 mitochondrial 






Above 




carrier phosphate carrier member 3 








22 3839 l_at 


capping protein actin filament gelsolin-like 


CAPG 


M94345 


Above 


23 3S413_at 


defender against cell death 1 


DAD1 


D 15057 


Above 


24 39110_at 


eukaryotic translation initiation factor 4B 


EIF4B 


X55733 


Above 


25 39867_at 


Tu translation elongation factor 


TUFM 


S75463 


Above 




mitochondrial 






Above 


26 2062_at 


Insulin-like growth factor binding protein 7 


LKjrt>r 1 




27 2036_s_at 


CD44 antigen homing function and Indian 


CD44 




Above 




Dioou group system 








28 1914_at 


CyclinAl 


CCNA1 


U66838 


Above 


29 1327_s_at 


mitogen-activated protein kinase kinase 


MAP3K5 


U67156 


Above 












30 1126_s_at 


Human cell surface glycoprotein CD44 


CD44 


L05424 


Above 




(CD44) gene, 3' end of long tailed isoform. 








31 1102_s_at 


Nuclear receptor subfamily 3 group C 


NR3C1 


M10901 


Above 




member 1 








32 873_at 


homeo box A5 


MOV A S 


1VJ.Z.UU / s 


Above 


33 706_at 


Glucocorticoid receptor, beta 




HG45S2- 


Above 






HT4987 




34 657_at 


protocadherin gamma subfamily C 3 


PCDHGC3 


LI 1373 


Above 




Table 34. Genes selected by DAV/SOM for Novel Class 




rxiiy men iA 


Gene Name 


GeneSymbol 


Reference 


Above/ 


number 






number 


Below 










Mean 


1 33137_at 


latent trarisforming growth factor beta 


LTBP4 


Y13622 


Above 




binding protein 4 








2 3808 l_at 


leukotriene A4 hydrolase 


LTA4H 


J03459 


Above 


3 3866 l_at 


seb4D 


HSRNASEB 


X75314 


Above 


4 39878_at 


protocadherin 9 


PCDH9 


AI524125 


Above 


5 35260__at 


KIAA0867 protein 


MONDOA 


AB020674 


Above 


6 1373_at 


transcription factor 3 E2A immunoglobulin 


TCF3 


M31523 


A 1_ 

Above 




enhancer binding factors E12/E47 








7 35177_at 


KIAA0725 protein 


KIAA0725 


AB018268 


AUUVC 


8 3S618_at 


Human PAC clone RP3-515N1 from 


LIMK2 


AC002073 


Above 




22qll.2-q22 








9 34947_at 


phorbolin-like protein MDS019 


MDS019 


AA442560 


Above 


10 40692_at 


transducin-like enhancer of split 4 homolog 


TLE4 


M99439 


Above 




of Drosophila E spl 








11 38364_at 


BCE-1 protein 


BCE-1 


AF068197 


Above 


12 37960_at 


carbohydrate chondroitin 6/keratan 


CHST2 


AB014679 


Above 




sulfotransferase 2 
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13 994_at Protein tyrosine phosphatase receptor type 

M 

14 31 892_at Protein tyrosine phosphatase receptor type 

M 

1 5 995_g_at Protein tyrosine phosphatase receptor type 

M 

16 41 073_at G protein-coupled receptor 49 

17 41708_at KIAA1034 protein 

18 34376_at protein kinase cAMP-dependent catalytic 

inhibitor gamma 

19 37978_at quinolinate phosphoribosyltransferase 

nicotinate-nucleotide pyrophosphorylase 
carboxylating 

20 38717_at DKFZP586A0522 protein 



PTPRM 


X5S288 


Above 


PTPRM 


XDoZoq 


Above 


PTPRM 


X58288 


Above 


GPR49 


AI743745 


Above 


KIAA1034 


A "D AO OQC7 


Above 


PKJG 


AB019517 


Below 


QPRT 


D78177 


Below 



DKFZP586A05 AL050159 
22 



Below 



?1 ^IQQQ f at 


Human transcrrot of unre arranged 




X58398 


Above 




immunoglobulin VH5 pseudogene 








ZZ Jolol_at 


LriiVl ana ori j proiein i 


LASP1 


X82456 


Below 


Zd h LZUZ_S_Ql 


conserved gene ampiiiicu m u&icua<n\^»jxiia 


OS4 


AF000152 


Above 


24 41138_at 


Antigen laenmiea oy monocionai 


MIC2 


Ml 6279 


Below 




antibodies 12E7 F21 and 013 








ZD 4U / / i_at 


lVlOet>LLL 


MSN 


Z98946 


Above 


zo 3yu/u_at 


omna/^ T^i-/~vC r\TA?T i"1 0 1 i \rp> Of*n T lT"r*Tl1Tl "fVl QflTI 

SmgeCl J_/10oOpiill«. 11K.C bCa uaL/JULLli Acia^i-ii 


SNL 


U03057 


Below 




homolog like 








27 32562_at 


endoglin Osler-Rendu- Weber syndrome 1 


ENG 


X72012 


Below 


28 36536_at 


schwannomin interacting protein 1 


SCHIP-1 


AF070614 


Below 


29 36650_at 


cyclin D2 


CCND2 


D13639 


Below 


30 39756_g_at 


X-box binding protein 1 


XBP1 


Z93930 


Above 


31 34168_at 


deoxynucleotidyltransferase terminal 


DNTT 


Ml 1722 


Above 


32 1389_at 


membrane metallo-endopeptidase neutral 


MME 


J03779 


Below 




endopeptidase enkephalinase CALLA 










CD 10 






Above 


33 41213_at 


peroxiredoxin 1 


PRDX1 


X67951 


34 3657 l_at 


Topoisomerase DNA II beta 180kD 


TOP2B 


X68060 


Above 


35 253_g_at 


clone GPCR W G protein-linked receptor 




L42324 


Below 




gene (GPCR) gene, 5' end of cds. 








36 252_at 


clone GPCR W G protein-linked receptor 




L42324 


Above 




gene (GPCR) gene, 5 1 end of cds. 








37 2087_s_at 


cadherin 1 1 type 2 OB-cadherin osteoblast 


CDH11 


D21254 


Above 


38 36976 at 


cadherin 1 1 type 2 OB-cadherin osteoblast 


CDH11 


D21255 


Above 



Affymetrix 
number 

1 35016_at 

2 36277 at 



Table 35. Genes selected by DAV/SOM for T-AIX 

Gene Name GeneSymbol Reference 

number 

M13560 



Human la-associated invariant gamma- 
chain gene, exon 8, clones lambda-y( 1,2,3). 

membrane protein (CD3-epsilon) gene CD3E 

-95- 



M23323 



Above/ 
Below 
Mean 
Below 

Above 
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3 38147_at 

4 38949_at 

5 32649_at 

6 33238 at 



7 


35643_ 




8 


36473. 


.at 


9 


38319 


at 



10 39709_at 

11 40775_at 

12 32794_g_at 

13 37039_at 

14 3805 l_at 

15 38095_i_at 

16 38096_f_at 

17 38415_at 

18 38833_at 

19 2059_s_at 

20 1241_at 

21 1105 s at 



SH2 domain protein 1 A Duncan s disease SH2D1A AL023657 Above 
lymphoproliferative syndrome 

protein kinase C theta PRKCQ L0 10S7 Above 

transcription factor 7 T-cell specific HMG- TCF7 X59871 Above 
box 

Human T-lyniphocyte specific protein LCK U23852 Above 
tyrosine kinase p561ck (LCK) aberrant 
mRNA, complete cds. 

nucleobindin2 NUCB2 X76732 Above 

ubiquitin specific protease 20 USP20 AB023220 Above 

CD3D antigen delta polypeptide TiT3 CD3D AA919102 Above, 
complex 

selenoprotein W 1 SEPW1 U67171 Above 

integral membrane protein 2A ITM2A AL02 1786 Above 

T cell receptor beta locus TRB X00437 Above 

major histocompatibility complex class E HLA-DRA J00194 Below 
DR alpha 

mal T-cell differentiation protein MAL X76220 Above 

major histocompatibiUty complex class II HLA-DPB 1 M83664 Below 
DP beta 1 

major histocompatibility complex class II HLA-DPB 1 M83664 Below 
DP beta 1 

protein tyrosine phosphatase type IVA PTP4A2 U14603 Above 
member 2 

Human mRNA for SB classll X00457 Below 
histocompatibility antigen alpha-chain 

lymphocyte-specific protein tyrosine kinase LCK M36881 Above 

protein tyrosine phosphatase type IVA PTP4A2 U14603 Above 
member 2 

T cell receptor beta locus TRB M12886 Above 



Table 36: Genes selected by DAV/SOM for TEL-AML1 



Affymetrix Gene Name 
number 



GeneSymbol 



1 3 1 508_at upregulated by 1 , 25-dihydroxyvitarnin D-3 VDUP 1 

2 33690_at cDNA DKFZp434A202 from clone 

DKFZp434A202 

3 3448 l_at vav proto-oncogene, exon 27, and complete VAV 

cds. 

4 36239_at POU domain class 2 associating factor 1 POU2AF1 

5 37470_at Leukocyte-associated Ig-like receptor 1 LAIR1 

6 3S203_at Potassium intermediate/small conductance KCNN1 

calcium-activated channel subfamily N 
member 1 



Reference 
number 

S73591 
AL080190 

AF030227 

Z49194 
AF0 13249 
U69883 



Above/ 
Below 
Mean 

Above 

Above 



Above 

Above 
Above 
Above 
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7 38570_at 

8 38578_at 

9 38906_at 

10 40729_s_at 

11 40745_at 

12 41097_at 

13 413Sl_at 

14 41442_at 

15 31898_at 

16 32660_at 

17 34194_at 

18 35614_at 

19 35665_at 

20 36008_at 

21 36524_at 

22 36537_at 

23 37280_at 

24 38652_at 

25 41200_at 

26 32224_at 

27 36985_at 

28 38124_at 

29 39824_at 

30 40570_at 

31 41498_at 

32 41S14_at 

33 32579 at 



major histocompatibility complex class II HLA-DOB 
DO beta 

tumor necrosis factor receptor superfamily TNFRSF7 
member 7 

spectrin alpha erythrocytic 1 elliptocytosis SPTA1 
2 

nuclear factor of kappa light polypeptide NFKBIL1 
gene enhancer in B-cells inhibitor-like 1 

adaptor-related protein complex 1 beta 1 AP1B1 
subunit 

telomeric repeat binding factor 2 TERF2 

KIAA0308 protein KIAA0308 

core-binding factor runt domain alpha CBFA2T3 
subunit 2 translocated to 3 

KIAA02 1 2 gene product KIAA02 1 2 

KIAA0342 gene product KIAA0342 
cDNA FLJ21697 fis clone COL09740 

transcription factor-like 5 basic helix-loop- TCFL5 
helix 

Phosphoinositide-3-kinase class 3 PIK3C3 

protein tyrosine phosphatase type IVA PTP4A3 
member 3 

Rho guanine nucleotide exchange factor ARHGEF4 
GEF4 

Rho-specific guanine nucleotide exchange P 1 1 4-RHO- 

factorpll4 " OE¥ 

MAD mothers against decapentaplegic MADH1 
Drosophila homolog 1 

hypothetical protein FLJ20154 FLJ20154 

CD36 antigen collagen type I receptor CD36L1 
thrombospondin receptor like 1 

KIAA0769 gene product KIAA0769 

isopentenyl-diphosphate delta isomerase IDI1 

midkine neuiite growth-promoting factor 2 MDK 
ESTs 

forkhead box OlA rhabdomyosarcoma FOXOl A 

KIAA0911 protein KIAA0911 

fucosidase alpha-L- 1 tissue FUCA1 

SWI/SNF related matrix associated actin SMARCA4 
dependent regulator of chromatin subfamily 
a member 4 



INSR 
PIM1 



34 33 1 62_at insulin receptor 

35 1779_s_at pim-1 oncogene 

36 1488_at protein tyrosine phosphatase receptor type PTPRK 

K 



X03066 

M63928 

M61877 
Y14768 

L13939 

AF002999 
AB002306 
AB010419 

D86967 
AB002340 
AL049313 
AB012124 

Z46973 
AF041434 

AB029035 

AB011093 

U59912 

AF070644 
Z22555 

AB018312 

X17025 

X55110 

AI391564 

AF032885 

AB020718 

M29877 

D26156 



X02160 
Ml 6750 
L77886 



Above 

Above 

Above 
Above 

Above 

Above 
Above 
Above 

Above 
Above 
Above 
Above 

Above 
Above 

Above 

Above 

Above 

Above 
Above 

Above 
Above 
Above 
Above 
Above 
Above 
Above 
Above 



Above 
Above 
Above 
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37 1325_at 


MAD mothers against decapentaplegic 
Drosophila liomolog 1 


MADH1 


U59423 


Above 


38 I336_s_at 


protein kinase C beta 1 


PRKCBl 


X063 1 8 


Above 


39 1299_at 


Telomeric repeat binding factor 2 


TERF2 


X93512 


Above 


40 1217_g_at 


protein kinase C beta 1 


PRKCBl 


A07109 


Above 


A 1 1 (\ni n + 

41 iu//__at 


recombination activating gene 1 


n API 


TV 4" OO/I HA 

JV12y4 /4 


Above 


/IT ATI * a 

42 932_i_at 


zrnc finger protein 91 HPF7 HTF10 


ZNF91 


LH672 


Above 


43 S80_at 


FK506-binding protein 1A 12kD 


FKBP1A 


M34539 


Above 


44 755_at 


inositol 1 4 5 -triphosphate receptor type 1 


ITPR1 


D26070 


Above 


45 577_at 


midkine neurite growtli-promoting factor 2 


MDK 


M94250 


Above 


46 160029_at 


protein kinase C beta 1 


PRKCBl 


X07109 


Above 



C. Comparison of genes selected by the different metrics . 

There is a high degree of overlap between the genes chosen by the various 
5 metrics, however the top ranked genes for each metric differ. Despite this, the top 
genes selected by the various metrics are all able to accurately identify the leukemia 
risk groups as detailed below. As a result, a limited number of genes can be used to 
accurately identify the genetic subtypes and one can use non-overlapping lists and still 
achieve high prediction accuracy. Thus, there are many genes that are distinct 
10 discriminators of these seven risk groups, and one need only to use a small subset of 
these in a supervised learning algorithm to accurately identify a case as belonging to 
the genetic subtype. 

D. Decision tree for the diagnosis of genetic subtypes 

15 Classification was approached using a decision tree format, in which the first 

decision was T-ALL versus B-lineage (non-T-ALL). Within the B-lineage subset, 
cases were then sequentially classified into the known risk groups characterized by 
the presence of E2A-PBX1, TEL-AML1, BCR-ABL, MLL chimeric genes, and lastly 
hyperdiploid >50 chromosomes. Cases not assigned to one of these classes were left 

20 unassigned. Classification was performed using the supervised learning algorithms 
described below. 

E. Description of Supervised Learning Algorithms 

An analysis of the profiles was performed using alinear classifier, C4.5, and a variety 
25 of different non-linear classifiers. The non-linear classifiers consistently outperformed 
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the linear classifier. Therefore, only the description and data from non-linear 
classifiers are included below. 

1 . Support Vector Machine (S VM) 

5 Support vector machine (SVM) selects a small number of critical boundary 

instances from each class and builds a linear discriminant function that separates them 
as widely as possible (Witten and Frank, Data Mining: Practical Machine Learning 
Tools and Techniques with Java Implementation, Morgan Kaufinann, 1999, herein 
incoiporated by reference). In the case where no linear separation is possible, the 

10 technique of "kernel" is used to automatically inject the training instances into a 

higher dimensional space and a separator is learned in that space. The Weka version 
of SVM developed at the University of Waikato of New Zealand 
(www.cs.waikato.ac.nz/ml/weka), which implements Piatt's sequence minimal 
optimization algorithm for training a support vector classifier using polynomial 

1 5 kernels was used (Piatt, "Fast Training of Support Vector Machines Using Sequential 
Minimal Optimization," Advances in Kernel Methods— Support Vector Learning, 
Schlkpof et al, eds., MIT Press, 1998, herein incorporated by reference). 

2 . Prediction by Collective Likelihood of Emerging Patterns (PCL) 
20 Emerging patterns (EPs) are a notion used in data mining to discover sharp 

differences between two classes of data (Dong and Li, "Efficient Mining of Emerging 
Patterns: Discovering Trends and Differences," Proc. 5th ACM SIGKDD 
International Conference on Knowledge Discovery and Data Mining, pp. 43-52 
(1999), herein incorporated by reference). An EP is a pattern— the expression level of 

25 several genes in our case— whose frequency increases significantly from one class of 
samples to another class, hi particular, the most general patterns that have infinite 
growth in the sense that their frequency in one class is 0% and in another class is 
greater than 0% and none of their proper subpatterns are EPs were identified. These 
EPs can then be combined into reliable rules for subtype prediction. Three earlier 

30 methods for classification based on EPs are JEP(Li et al. (2001) Knowledge and 
Information System 3:131-45, herein incorporated by reference), DeEPs (Li et al, 
"DeEPs: Instance-based Classification by Emerging Patterns," Proc. 4th European 
Conference on Principles and Practice of Knowledge Discovery in Databases, pp. 
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191-200, 2000, herein incorporated by reference), and CAEP (Dong et al. 9 "CAEP: 
Classification by Aggregation Emerging Patterns," Proc. 2nd International 
Conference on Discoveiy Science, pages 30-42, 1999, herein incorporated by 
reference). 

5 In this analysis an original variation in the spirit of JEP but with a different 

manner of aggregating EPs was used. Given two training data sets D p and D n and a 
testing sample T, the first phase was to discover EPs from D p and D n . Denote the EPs 
of Dp, in descending order of frequency, as TopEP p i, . . ., TopEP p i, and those of D n as 
TopEP"i, . . TopEP" . Suppose T contains the following EPs of D p : TopEP p i; , . . 

10 TopEP P i*, where il < i2 < . . . < ix <= i; and the following EPs of D n : TopEP"/, . . 

TopEPV where j 1 < \2 < . < jy <= j. hi the next step, two scores were calculated for 
T: scorep = L[frequency(TopEP p i//1 )/frequency(TopEP p m )] and scorer, = 
S[frequency(TopEP n jm )/frequency(TopEP n m )], summing over m = 1 ..k, where k « i 
and k « j. In this case, k is chosen to be 25. Finally, a prediction is made on T as 

15 follows: If score p > score n , then T is predicted to be in class D p ; otherwise, it is 
predicted as class D n . 

The spirit of this variation is to measure how far the top k EPs contained in T 
are away from the top k EPs of a class. For example, if k = 1, then scorep indicates 
whether the number-one EP contained in T is far from the most frequent EP of D p . If 

20 the score is the maximum value 1, then the "distance" is very close, namely the most 
common property of D p is also present in this testing sample. With smaller scores, the 
distance becomes further and the likelihood of T belonging to D p becomes weaker. 
Using more than one top-ranked EPs in this way leads to very reliable predictions. 
This variation of EP-based classification method was termed "prediction by 

25 collective likelihood of EPs" or PCL for short. 

3 . /c-Nearest Neighbor (£-NN) 

/c-NN is a typical instance-based learner where the class of a new instance is 
decided by the majority class of its k closest neighbors (Cover and Hart (1967) IEEE 
30 Transactions on Information Tlieoiy 13:21-27, herein incorporated by reference). 

This method was used with the Euclidean distance metric. Conceptually, this is one 
of the most straightforward methods and is often used as a baseline for comparison 
purposes. The data were normalized using the z-score method, then the "best" few 
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genes were chosen using one of the statistical gene selection methods. For these 
experiments, the "top »" genes, where n= 1-50, were used. The expression values of 
the top genes from each diagnostic sample were treated as a vector in //-dimensional , 
space. To classify a new sample, the same top n genes were chosen, and the 
5 Euclidean distance was computed between this new vector and each vector in the 
training data. The prediction was made by a majority vote of the k nearest samples, 
where k=\ or k=3. In this experiment, k was set to 1 . 

4. Artificial Neural Network (ANN) 

1 0 The artificial neural network (ANN) learning models built are all feed- 

forward, fully connected, and non-recurrent. The input layer of each ANN contains 
50 units, which correspond to the 50 input values (the "top 50" scoring genes). Each 
ANN has one hidden layer with 4 units, and an output layer that contains two units, 
which represent the two class labels. In a preprocessing step all input data was 

1 5 normalized using the z-score method. The apparent error was estimated using 3-fold 
cross-validation. That is, for each training procedure, the training samples were 
randomly shuffled and divided into three groups of approximately equal size. A 
model was built with two of the groups and the third group was set aside for 
validation. This step was repeated three times, each time with a different group for 

20 validation. This shuffling-training process was repeated ten times, resulting in 30 
ANN models. Each test sample was fed into each of the 30 ANN models, and the 
output was the average of the 30 outputs. The class predicted was the one that was 
represented by the output unit with the larger average output value. 

25 F. Table of results using the different algorithms to predict the genetic subgroups 
A summary of the true prediction accuracy on the blinded test set of 1 12 cases 
are presented in Tables 37-39. Sensitivity was calculated as the number of positive 
samples predicted /the number of true positives. Specificity was calculated as the 
number of negative samples predicted/the number of true negatives. 

30 
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Table 37. True Prediction Accuracy Results 






on Test Set using SVM and ANN algorithms 












o v xvi 




ANN 




Chi Sq CFS 


1 -stats 


&kJIv±/ LJl\ V 


Wilkins' 


T-ALL 


True Accuracy 


100 


100 




1 AA 


1 fin 

lUU 




Sensitivity 


100 


100 


100 


100 


100 




Specificity 


100 


100 


1 AA 


1 nn 
1 uu 


100 


E2A-PBX1 


True Accuracy 


100 


100 


100 


lUU 


1 an 

1UU 




Sensitivity 


100 


100 


100 


100 


100 




Specificity 


100 


100 




i fin 

1 uu 


100 


TEL-AML1 


True Accuracy 


99 


99 


yo 


y / 


1 fin 




Sensitivity 


100 


100 


100 


100 


100 




Specificity 


98 


98 


y / 


y / 


100 


BCR-ABL 


True Accuracy 


95 


97 


94 


0*7 

y / 


y / 




Sensitivity 


50 


67 


33 


83 


OJ 




Specificity 


100 


100 


100 


98 


SO 


MLL 


True Accuracy 


100 


98 


100 


97 






Sensitivity 


100 


100 


100 


86 


100 




Specificity 


100 


98 


100 


100 


100 


H>50 


True Accuracy 


96 


96 


96 


95 


94 




Sensitivity 


100 


100 


100 


95 


100 




Specificity 


93 


93 


93 


93 


89 


Table 38. True Prediction Accuracy Results on Test Set using /t-NN 












£-NN 








Chi Sq 


CFS 


T-stats 


Wilkins' 


T-ALL 


True Accuracy 


100 




100 


100 


100 




Sensitivity 


100 




100 


100 


100 




Specificity 


100 




100 


100 


100 


E2A-PBX1 


True Accuracy 


100 




100 


100 


100 




Sensitivity 


100 




100 


100 


100 




Specificity 


100 




100 


100 


100 


TEL-AMU 


True Accuracy 


98 




98 


99 


100 




Sensitivity 


100 




96 


96 


100 




Specificity 


97 




98 


100 


100 


BCR-ABL 


True Accuracy 


94 




97 


95 


93 




Sensitivity 


33 




67 


50 


67 




Specificity 


100 




100 


100 


96 


MLL 


True Accuracy 


100 




98 


95 


100 




Sensitivity 


100 




83 


100 


100 




Specificity 


100 




100 


94 


100 


H>50 


True Accuracy 


98 




96 


94 


98 




Sensitivity 


100 




100 


95 


100 




Specificity 


96 




93 


93 


96 



5 
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Table 39. True Prediction Accuracy Results on Test Set using PCL 









PCL 










Chi Sq 




CFS 





T-ALL 


True Accuracy 

oensinviry 

Specificity 


100 
100 
100 




100 
100 
100 




E2A-PBX1 


True Accuracy 

oensiirviiy 

Specificity 


ND 
ND 
ND 




100 
100 
100 




TEL-AML1 


True Accuracy 

oensiiiviiy 

Specificity 


99 
96 
100 




ND 
ND 
ND 





BCR-ABL 


True Accuracy 

Sensitivity 

Specificity 


97 
67 
100 




ND 
ND 
ND 




MLL 


True Accuracy 

Sensitivity 

Specificity 


100 
100 
100 




ND 
ND 
ND 




H>50 


True Accuracy 

Sensitivity 

Specificity 


98 
100 
96 




ND 
ND 
ND 





The assignment of a leukemic sample to a specific biologic subgroup is more 
accurately reflected by its gene expression profile than by the presence or absence of a 
5 specific genetic lesion. For example, four patients that had expression profiles 
classified as TEL-AML1, despite lacking a TEL-AML1 chimeric message by the 
reverse transcriptase polymerase chain reaction (RT-PCR) were found to have an 
alteration in TEL, suggesting a common underlying biology. Thus, from a technical 
viewpoint, gene expression profiling provides a viable alternative to standard 
10 diagnostic approaches. 

G. Absence of correlation of expression data for genetic subtypes with stage of B- 
cell differentiation 

The expression profiles of the different risk groups of B-cell leukemias do 
15 notcorrespond to markers of different stages of B-cell differentiation,. The first issue 
is defining the stage of B-cell differentiation. The defined stages of BM derived B- 
cells relevant to pediatric ALL are outlined below in Table 40, along with their 
frequency in pediatric ALL (Campana and Behm (2000)/. Immunologic Methods, 
243:59-75). Three stages of differentiation are defined by a limited number of 
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markers. In Table 41 below, the distribution of the leukemia cases into these B-cell 
differentiation stages is shown. As can be seen, none of the genetic subtypes is 
specifically associated with one of these three stages of differentiation. Thus, this 
simple analysis clearly shows that the majority of the chromosomal translocation 
subgroups in pediatric ALL do not correspond to a specific stage of B-cell. 
differentiation. This is a well-known fact in the field of pediatric ALL and differs 
from the relationship typically seen between chromosomal translocations and other 
genetic lesions, and the stage of differentiation seen in B-cell lymphomas. 



10 



15 



Subtype 




Leukocyte antigen expression 


Frequency 






( % of cases positive) 


(%) 




CD19 


CD22 clg|a. slgja, slgKorX 




Early Pre-B 


100 


>95 0 0 0 


60-65 


Pre-B 


100 


100 100 0 0 


20-25 


Transitional 


100 


100 100 100 0 


1-3 



Abbreviations: clg jx, cytoplasmic immunoglobulin chain; slg surface immunoglobulin \i chain; 
slg k or A,, surface immunoglobulin k or X chains 

a D.Campana and F.G.Behm, "Immunophenotyping of leukemia", Journal of Immunological Methods 
243:59-75,2000. 

Table 41. Distribution of genetic subtypes by immunophenotype a 





EARLY PRE-B 


PRE-B 


TRANSITIONAL 
PRE B 


E2A 


0 


17 


6 


TEL 


55 


23 


0 


BCR 


11 


3 


0 


MLL 


12 


6 


1 


Hyperdip>50 


49 


9 


5 


Novel 


8 


4 


1 


Total 


172 


77 


24 



a For this analysis, samples with other immunophenotypes (NOS or mature B-cell) were not included 

20 The next goal was to determine whether a set of genes that could accurately 

identify subjectss by their stage of differentiation, regardless of leukemai risk group. 
To accomplish this, cases were assigned into one of three classes, early pre-B, pre-B, 
or transitional pre-B based on their immunophenotype. The top 50 genes that 
distinguished each group from the other two groups were selected using the Wilkins' 

25 metric. These genes were then used in an ANN analysis to assess their performance 
in correctly classifying the 273 diagnostic B-lineage ALL samples, for which a stage 
of differentiation could be determined, through a process of cross validation. The 
results of this analysis are included below. 
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Table 42. Accuracy Results for immunophenotype discrimination using 

Wilkins' metric and ANN algorithm 

^ Accuracy Sensitivity Specificity 

Early Pre-B a 78.39% 85.47% 66.34% 

Pre-B b 71.79% 38.96% 84.69% 

Transitional Pre-B c 91.24% 33.33% 96.79% 

a Cells with CD19+, CD22+, cytoplasmic Igja-, surface Igji- immunophenotype 
5 b CeIls with CD19+, CD22+, cytoplasmic Ign+, surface Igi*- immunophenotype 

c Cells with CD 19+, CD22+, cytoplasmic Igja+, surface Ig*i+ immunophenotype 

The selected genes perform rather poorly in correctly assigning cases to specific B- 
cell differentiation stages, with accuracies well below those achieved for prediction of 

10 the genetic subgroups. When these genes are used in a two-dimensional hierarchical 
clustering algorithm they failed to cluster cases by immunophenotype, but instead, 
resulted in the loose clustering of some of the genetic subgroups, including E2A- 
PBXU TEL-AML1, BCR-ABL, MLL, and hyperdiploid >50. The analysis was 
repeated using genes selected by DAV and again, no clustering of the 

15 immunophenotypically-defined stages was observed. Thus, it was not possible to 
identify expression profiles that can accurately identify the immunophenotypically- 
defmed differentiation stages of pediatric B-cell ALL. Moreover, the expression 
profiles that were defined for the genetic subtypes are not profiles that correspond to 
specific stages of B-cell differentiation. Although some of the genes that define 

20 specific genetic subtypes can be associated with a particular stage of B-cell 

differentiation, the majority of the discriminating genes show no correlation with 
differentiation. 



H. Results for relapse prediction 

25 In the prediction of whether a patient would go into continuous complete 

remission or would relapse, a subtype-specific approach was adopted. An individual 
classifier was constructed for each subtype of ALL. Given a sample, the subtype was 
first predicted, and then the corresponding subtype-specific prognostic classifier was 
invoked to predict whether the patient would relapse. This subtype-specific approach 

30 was required because an expression profile predictive of relapse for the entire group 
could not be defined. 

In the construction of the type-specific classifiers, genes were selected by CFS 
unless this algorithm returned >20 genes, in which case the top 20 ranked genes by T- 
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statistics were used. When the T-statistics method was used, the selection of how 
many among the top 20 T-statistics genes were to he used was made by performing 
cross validation experiments— -that is, the top n genes for n = 1 ..20 were picked the n 
that gave the best cross validation results was selected. The cross validation results 
5 for the optimal choice of genes are summarized in Table 43 below. The genes that 
were chosen for use in subtype-specific relapse predictions are summarized in Table 
44. 

Table 43. Results of relapse prediction on indicated subgroups 

P value by 





Relapse 


CCR 


# genes 


metric 


Accuracy 


permutation test 


T-ALL 


8 


26 


7 


t-stats 


97 


0.034 


H>50 


5 


43 


.13 


t-stats 


100 


0.018 


TEL-AML1 


3 


56 


7 


CFS 


100 


0.145 


MLL 


5 


7 


4 


t-stats 


100 


0.104 


Others 


4 


56 


20 


t-stats 


98.3 


0.079 



Table 44. Genes selected by T-statistics/CFS for relapse (T-ALL) 

Gene Name GeneSymbol Reference Above/ 

Number Below 
Mean 

Human TBXAS 1 gene for thromboxane synthase TBXAS 1 D34625 Above 



Homo sapiens mRNA for 41-kDa 
phosphoribosylpyrophosphate synthetase- 
associated protein 

Human DNA sequence from PAG 370M22 
Human spinal muscular atrophy gene 
Human cell surface glycoprotein CD44 
Human mRNA for KIAA0056 gene 
Human BTK region clone ftp-3 mRNA 



AB007851 Above 

Z82206 Above 

SMA5 X83301 Above 

CD44 L05424 Above 

KIAA0056 D29954 Above 

U01923 Above 



Table 45, Genes Selected by T statistics/CFS for relapse Hyperdiploid > 50 





Affymetrix 
number 


Gene Name 


Gene Symbol 


Reference 
Number 


Above/ 
Below 
Mean 


1 


37721_at 


deoxyhypusine synthase 


DHPS 


U79262 


Above 


2 


3S721_at 


KIAA1 536 protein 


KIAA1536 


W72733 


Above 


3 


40120_at 


hydroxyacyl glutathione 


HAGH 


X90999 


Above 






hydrolase 






Above 


4 


41386_i_at 


KIAA0346 protein 


KIAA0346 


AB002344 
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5 


38677_at 


stress 70 protein chaperone 
rnicrosome-associated 60kD 


STCH 


U04 /ij 


Above 


6 


37620_at 


Human TFIID subunits TAF20 
andTAF15 mRNA, complete 




U57693 


Above 


7 


34703_f_at 


cds. 
EST 




A A 1 SI 071 


Above 


8 


38355_at 


DEAD/H Asp-Glu-Ala-Asp/His 
box polypeptide Y chromosome 


DBY 


AF000984 


Above 


9 


41214_at 


ribosomal protein S4 Y-linked 


RPS4Y 


M58459 


Above 


10 


34530_at 


Homo sapiens cDNA FLJ22448 
tis clone riK^uv i 




W73822 


Above 


11 


603_at 


nuclear receptor subfamily 2 
group C member 1 


NR2C1 


M29960 


Above 


12 


32697_at 


inositol myo 1 or 4 


IMPA1 


AF042729 


Above 






monophosphatase 1 




D26067 


Above 


13 


41129_at 


KIAA0033 protein 


KIAA0033 


14 


33333_at 


KIAA0403 protein 


KIAA0403 


AB007863 


Above 


15 


37078_at 


CD3Z antigen zeta polypeptide 


CD3Z 


J04132 


Above 






TiT3 complex 






Above 


16 


38148_at 


cryptochrome 1 photolyase-like 


CRY1 


D83702 


17 


39150__at 


ring finger protein 1 1 


RNF11 


U69559 


Above 


18 


33869_at 


DKPZp586N1323 from clone 
DKFZp586N1323 




AL0802 1 8 


Above 


19 


41447_at 


KIAA0990 protein 


KIAA0990 


AB023207 


Above 


20 


39369_at 


KIAA0935 protein 


KIAA0935 


AB023152 


Above 



Table 46: Genes selected by T-statistics/CFS for relapse (TEL-AML1I) 



Affymetrix 
number 



Gene Name 



1 35797 at Human interleukin-13 gene 

2 37524_at Human death-associated protein kinase 

3 34243_i_at Human l(3)mbt protein homolog mRNA 

4 41398_at Homo sapiens mRNA. CDNA 

DKFZp564A186 

5 351 95_at H. sapiens mRNA for phosphate cyclase 

6 32393_s_at Homo sapiens cDNA 

7 31 909_at Homo sapiens mRNA for KIAA0754 

protein 



Gene 
Symbol 

IL-13Ra 
DRAK2 



KIAA0754 



Reference 
number 

Y10659 
AB011421 
U89358 
AL049305 

Y11651 

W27466 

AB018297 



Above/ 
Below 
Mean 

Above 

Above 
Above 
Above 

Above 
Above 
Above 
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Table 47: Genes selected by T-statistics/CFS for relapse (MLL) 



Affymetrix Gene Name 
number 



Gene 
Symbol 



Reference 
number 



1 294_s__at 

2 38226_at 

3 1398_g_at 

4 409 at 



Protein Kinase Pitslre, Alpha, Alt. Splice 1- 
Feb 



23hl 1 Homo sapiens cDNA W27152 

Human protein kinase (MLK-3) mRNA HUMMLK3A L32976 

Human mRNA for 14.3.3 protein, a protein X56468 
kinase regulator 



Above/ 
Below 
Mean 
Below 

Below 
Above 
Below 



Table 48: Genes selected by T-statistics/CFS for relapse (Others) 



Affymetrix 
number 



Gene Name 



1 33782_r_at mi82f03.sl Homo sapiens cDNA, 3 end 

/clone=IMAGE~1090397 

2 33338_at Human transcription factor ISGF-3 mRNA 

3 40242_at Human (clone N5-4) protein p84 mRNA 

4 37018_at . qd05c04.xl Homo sapiens cDNA, 3 end 

/clone=IMAGE-l 722822 

5 38337_at Homo sapiens zinc finger protein mRNA 

6 4 1464_at Human mRNA for KIAA0339 gene 

7 38064_at H.sapiens lrp mRNA 

8 33173_g_at yc89b05.rl Homo sapiens cDNA, 5 end 

/clone=IMAGE-23231 



GeneSymbol Reference 
number 

AA587372 



9 33365_at 

10 39367_at 

11 41108_at 

12 37304_at 

13 40359_at 

14 32792_at 

15 34726_at 

16 40299 at 



Homo sapiens mRNA for KIAA0945 
protein 

ni38e08.sl Homo sapiens cDNA, 3 end 
/clone=IMAGE-979142 

Homo sapiens mRNA for putative GTP- 
binding protein 



Homo sapiens heterochromatin protein p25 P25beta 
mRNA 



Human DNA-binding protein (HRC1) 
mRNA 

Human DNA sequence from clone 465N24 
on chromosome lp35.1-36.13. Contains 
two novel genes, ESTs, GSSs and CpG 
islands 

Human voltage-gated calcium channel beta 
subunit mRNA 

Homo sapiens G-protein coupled receptor 
RE2 mRNA, 



M97936 
L36529 
AI189287 

U62392 
KIAA0339 AB002337 
LRP X79882 

T75292 

KIAA0945 AB023162 
AA522537 

PGPL Y14391 

U35451 

HRC1 M910S3 

AL031432 



U07139 



AF091890 



Above/ 
Below 
Mean 

Above 

Above 
Above 
Above 

Above 
Above 
Above 
Below 

Above 
Above 

Above 

Below 

Above 
Above 



Above 
Above 
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17 40704_ 


at 


H.sapiens mRNA for phosphatidylinositoi 


Z29090 


Above 






3 -kinase 








at 


TjAmn cqmVnQ nS^ Win dins nrotein mRNA 


U82939 


Above 


19 3203 8_ 


_s_at 


wi30cl2.xl Homo sapiens cDNA, 3 end 


AI739308 


Above 






/clone=IMAGE-23 9 1 766 






20 39613. 


at 


H.sapiens HUMM9 mRNA 


X74837 


Above 



I. Permutations test results 

As the number of relapse samples were small, in addition to the usual cross validation 
5 experiments, 1 000 permutation experiments were performed for each subtype-specific 
relapse study. In each permutation experiment, the samples were re-partitioned in a 
manner that preserved class size by randomly swapping the class labels ("relapse" or 
"continuous complete remission"). The same metric was then employed to pick the 
same number of genes as in the original partitioning of the samples given by the 
1 0 original class labels. S VM was then used to obtain a prediction accuracy by cross 
validation for this random partition using these freshly selected genes. The 
percentage of these 1000 permutation experiments was taken as a p-value that gave an 
indication on how many random partitions of the original samples could achieve the 
same accuracy as the original samples. The results of these permutation experiments 
1 5 are summarized in the last column of Table 43 above. These results show that the 
high accuracy obtained on the predictability of relapse in T-lineage ALL, 
Hyperdiploid>50, and others are unlikely to be a random event. The higher p-values 
obtained for the subtypes of TEL-AML1 and MLL are probably due to the small 
number of relapse samples available for analysis. 

20 

Table 49. Permutation test results for predictors of T- ALL relapse 



Rank 


Affymetrix 
number 


t-statistic 
value 


Perm 1% 


Perm 5% 


neighbors 


1 


33777_at 


7.8337 


7.3774 


5.4783 


6 


2 


41853_at 


6.1727 


6.5948 


4.8117 


16 


3 


38866_at 


5.9890 


6.0293 


4.5611 


12 


4 


41643_at 


5.6106 


5.6815 


4.3877 


12 


5 


1126_s_at 


5.4777 


5.5162 


4.2375 


11 


6 


41862_at 


5.3734 


5.3759 


4.1208 


11 


7 


41131_f_at 


4.9134 


5.2280 


4.0295 


17 
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Table 50. Permutation test results for predictors of Hyperdiploid > 50 relapse 
Affymetrix t-statistics 



Rank 


number 


value 


Perml% 


Perm 5% 


neighbors 


1 


3772 l_at 


8.7160 


12.7358 


9.9506 


75 


2 


38721_at 


8.4162 


10.7256 


8.8438 


59 


3 


40120_at 


7.2736 


9.9837 


8.03S3 


73 


4 


41386_i_at 


6.3436 


9.0552 


7.5579 


88 


5 


3S677_at 


6.2698 


8.8633 


7.2466 


88 


6 


37620_at 


6.2174 


S.4154 


6.9604 


82 


7 


34703_f_at 


6.0770 


S.0982 


6.8835 


83 


8 


38355_at 


5.5120 


7.8657 


6.7434 


92 


9 


41214_at 


5.4262 


7.6583 


6.6094 


90 


10 


34530_at 


5.4013 


7.5991 


6.5109 


87 


11 


603_at 


5.3142 


7.5903 


6.4409 


87 


12 


32697_at 


5.1785 


7.5146 


6.3265 


90 


13 


41129_at 


5.1450 


7.3939 


6.2121 


88 


14 


33333_at 


5.1061 


7.2601 


6.1389 


87 


15 


37078_at 


5.0738 


7.1484 


6.030S 


S6 


16 


3S148_at 


4.9256 


6.9688 


5.9230 


93 


17 


39150_at 


4.9061 


6.9273 


5.9015 


93 


18 


33869_at 


4.8256 


6.8900 


5.8367 


93 


19 


41447_at 


4.7919 


6.8135 


5.7621 


93 


20 


39369 at 


4.7790 


6.7731 


5.7391 


92 



Individually, the discriminating genes for relapse in T-ALL are significant at either 
the 1% or 5% level, while those for hyperdiploid >50 fall at approximately the 7% 
level. 

5 

Table 51. Results of relapse prediction on indicated subgroups 

Accurac P value by 





Relapse 


CCR 


# genes 


metric 


y 


permutation test 


T-ALL 


8 


26 


7 


t-stats 


97 


0.034 


H>50 


5 


43 


13 


t-stats 


100 


0.018 


TEL-AML1 


3 


56 


7 


CFS 


100 


0.145 


MLL 


5 


7 


4 


t-stats 


100 


0.104 


Others 


4 


56 


20 


t-stats 


98.3 


0.079 



As the number of relapse samples were small, in addition to the usual cross 
validation experiments, 1000 permutation experiments were also performed for each 
10 subtype-specific relapse study. In each permutation experiment, the samples were re- 
partitioned in a maimer that preserved class size by randomly swapping the class 
labels ("relapse" or "continuous complete remission"). The same metric was 
employed to pick the same number of genes as in the original partitioning of the 
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samples given by the original class labels. SVM was then used to obtain a prediction 
accuracy by cross validation for this random partition using these freshly selected 
genes. The percentage of these 1000 permutation experiments was taken as a p-value 
that gave an indication on how many random partitions of the original samples could 
5 achieve the same accuracy as the original samples. The results of these permutation 
experiments are summarized in the last column of Table 51 above. These results show 
that the high accuracy obtained on the predictability of relapse in T-lineage ALL, 
Hyperdiploid>50, and others are unlikely to be a random event. The p-values for the 
subtypes of TEL-AA4L1 and MLL are weaker than the other subtypes. However, in the 
10 case of TEL-AML1 the number of relapse samples were exceedingly small (3) and in 
the case of MLL the number of relapse and non-relapse samples were both very small. 

J. Results for secondary AML prediction 

For the secondary AML prediction ,the same subtype-specific approach was 

15 adopted as described earlier in relapse prediction. This time only the TEL-AML1 

subtype had sufficient number of samples for a secondary AML prediction model to 
be developed. For this model, the MIT score (Golub et ah (1999) Science 286:531- 
37, herein incorporated by reference) was used to select genes and SVM to perform 
classification using these genes. The MIT score of a gene is defined as T = \\x.\ - 

20 jx 2 |/(ai + a 2 ), where \x, x is the mean expression of that gene in the i th class and <3 X is the 
standard deviation of that gene in the i th class. This formula assigns higher value to a 
gene that has larger mean difference between two classes and has smaller variance 
within both classes. The 20 genes with the highest MIT scores in TEL-AML1 patients 
that went into continuous complete remission versus those TEL-AML1 samples that 

25 developed secondary AML are listed in Table 52 below. 1 00% accuracy for 

secondary AML prediction accuracy was achieved on TEL-AML1 specific subtype 
samples using these 20 genes. A permutation test was also performed in the same 
manner as described earlier in the subtype-specific relapse prediction, and obtained a 
p-value of 0.031 was obtained, demonstrating that the predictability of the 

30 development of secondary AML in TEL- AML 1 -specific patients was unlikely to be a 
random event. 
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Affymetrix 
Number 



Table 52. Genes selected by MIT score for secondary AML 



Gene Name 



Gene 
Symbol 



Reference 
Number 



Above/ 
Below 
Mean 



TEL- AML 1 
1 34890 at 



ATPase H transporting lysosomal vacuolar ATP6A1 
proton pump alpha polypeptide 70kD 
isoforrn 1 



2 40925_at hypothetical protein FLJ 1 0803 

3 171 9_at mutS E. coli homolog 3 

4 32877 i at EST IMAGE:954213 



FLJ10803 
MSH3 



L09235 



AA554945 

U61981 

AA524802 



Above 



Above 
Above 
Above 



5 
6 
7 
8 



32650_at neuronal protein 
33 1 73_g_at hypothetical protein FLJ 1 0849 
RSU-l/RSP-1 



32545_r_at 
34889 at 



NP25 

FLJ10849 

RSU-1 



ATPase H transporting lysosomal vacuolar ATP6A1 
proton pump alpha polypeptide 70kD 
isoforrn 1 



Z7S388 
T75292 
L12535 
AA056747 



Above 
Above 
Above 
Above 



9 35180_at cDNA DKFZp586F1323 from clone AL050205 Above 

DKFZp586F1323 

10 34274_at KIAA1 1 16 protein KIAA1 116 AB029039 Above 

11 35727_at hypothetical protein FLJ205 17 FLJ20517 AI249721 Above 

12 1627_at tyrosine kinase (GB:Z25437) HG2715- Above 

HT2811 

13 1461_at nuclear factor of kappa light polypeptide NFKBIA M69043 Below 

gene enhancer in B-cells inhibitor alpha 

14 36023_at lacrimal proline rich protein LPRP AI864120 Above 

15 391 67_r_at serine or cysteine proteinase inhibitor SERPINH2 D83 1 74 Above 

clade H heat shock protein 47 member 2 

16 39969_at H4 histone family member G H4FG AA255502 Above 

17 38692_at NGFI-A binding protein 1 ERG1 binding NAB1 AF045451 Above 

protein 1 

18 1594_at polymerase RNA H DNA directed POLR2C J05448 Above 

polypeptide C 33kD 

19 33234_at RBP1 -like protein LOC51742 AAS87480 Above 

20 34739_at hypothetical protein FLJ20275 FLJ20275 W26023 Above 
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Table 53. Permutation test results for secondary AML 



Rank 


/U1VI11CU 1A 

number 


number 


Perm 1% 


Perm 5% 


Perm 
median 


neighbors 


1 


34890_at 


1.2204 


2.7933 


2.2138 


1.4712 


822 


2 


40925_at 


1.0712 


2.0006 


1.7607 


1.2884 


859 


3 


1719_at 


1.0599 


1.8536 


1.6272 


1.1894 


767 


4 


32877_i_at 


1.0364 


1.7125 


1.5218 


1.1200 


715 


5 


32650_at 


1.0217 


1.6580 


1.4584 


1.0776 


646 


6 


33173_g_at 


1.0126 


1.5868 


1.4132 


1.0416 


595 


7 


32545_r_at 


1.0097 


1.5536 


1.3630 


1.0223 


536 


8 


34S89_at 


0.9959 


1.5164 


1.3241 


1.0009 


512 


9 


351S0_at 


0.9854 


1.4838 


1.2938 


0.9777 


477 


10 


34274_at 


0.9420 


1.4759 


1.2721 


0.9600 


550 


11 


35727_at 


0.8493 


1.4482 


1.2507 


0.9415 


809 


12 


1627_at 


0.8471 


1.4207 


1.2398 


0.9254 


782 


13 


1461_at 


0.8312 


1.4012 


1.2260 


0.9114 


801 


14 


36023_at 


0.8177 


1.3551 


1.2012 


0.S995 


813 


15 


39167_r_at 


0.8136 


1.3462 


1.1806 


0.8894 


790 


16 


39969_at 


0.8122 


1.3395 


1.1702 


0.S785 


759 


17 


3S692_at 


0.8109 


1.3333 


1.1565 


0.8696 


729 


IS 


1594 at 


0.8103 


1.3142 


1.1503 


0.8626 


696 



Table 54: Additi 
T statistics for ] 


onal Genes selected by 
8CR-ABL risk group 


Gene symbol 


Accession Number 


TUB AY 


HG2259-HT234S 


TUBA1 


X06956 


CRADD 


U84388 


SLC2A5 


M55531 


PHYH 


AF023462 


ZFPL1 


AF001891 


1CD34 


S53911 


KIAA0015 


D13640 


CLECSF2 


X96719 


CD34 


M81945 


GAB1 


U43885 


IE2F5 


U31556 


CLTB 


M20470 


ENG 


X72012 


LOC55884 


AF038187 


TNFRSF1A 


M5S286 1 


TMSNB 


D82345 


SNL 


U03057 
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IKIAA0990 


AB023207 


jMAPlA 


W26631 


|MYPT2 


AB007972 


IIFI30 


J03909 


|ERPROT213-21 


U94836 


[DKFZP586A052 

|2 


AL050159 


iLOC51109 


AA126515 


f- 

1 


W29087 


(TSTA3 


U58766 


TNFRSF1B 


AIS13532 


GSN 


X04412 


KIAA0582 


AI761647 


STATE 


AF037989 




AL049313 


ITGA4 


X16983 


FLJ20500 


AA522530 


SDR1 


AF061741 


ARHGEF4 


AB029035 


C1SORF1 


AF009426 


MAPK14 


U19775 


FHL1 


AF063002 


GATA3 


X58072 


KIAA0076 


D38548 


KCNN1 


U69883 


POM121L1 


D87002 ! 


IFI30 


J03909 1 


ABL1 


X16416 


NELL2 


D83018 


MEST 


D78611 


S100A4 


W72186 


D12S2489E 


AJ001687 


ATP2B4 


W2S589 


CTGF 


X78947 


IRGS1 


S59049 


CDK9 


X80230 




AI524873 


STIM1 


U52426 


VEGFB 


U48801 


PPP2R2A 


M64929 


CASP2 


U13022 


SPS 


U34044 


HRK 


D83699 


KIAA0870 


AB020677 


ABL l 


U07563 


PKIA : 


376965 


FLJ12474 


\A306076 
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CD97 


X94630 


HCK 


M16591 


FYN 


M14333 


KTR2DL3 


AC006293 


DMPK 


L08835 


N33 


U42360 


FLJ13949 


AL041879 


PRKCZ 


Z15108 


IL17R 1 


U58917 


FMR2 


U48436 1 


•1NSR 


M10051 


[ahnak 


MS0899 


.KIAA0S78 


AB020685 


CD86 


U04343 




U82303 


KIAA1043 


AL033538 


N33 


U42349 


SYN47 


Y17829 


ITPR1 


D26070 


SFRS9 


AL021546 


EPOR 


M60459 


GAC1 


AF030435 


CAMK4 


D30742 


KIAA0084 


D42043 


LAT 


AJ223280 


XBP1 


Z93930 


FLT3LG 


U03858 


TESK1 






,AF070633 


KIAA0681 


IU89358 


^UTS 


IY17979 



T Table 55: Additional Genes selected 
hv statistics for E2A-PBX1 Risk Group 


Gene symbol 


Accession Number 


PBX1 


M86546 




AL049381 


FAT 


X87241 


BLK 


S76617 


DRF4 


U52682 


GS3955 


D87119 


KIAA0802 


AB018345 


;schip-i 


AF070614 


SNL 


U03057 


KIAA0655 


AB014555 


GS3955 


D87119 
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IIGFBP7 


L19182 


ICDKNIA 


U03106 i 


(CSF2RB 


TTA A S" f O 

H04668 


STATI2 


AF037989 


T/"T A A 1 AOO 

JsJAAlUzy 




T/T A A AO An 

KIAAU24 / 


TNO'7 A 1 A 

Db/434 




AL049397 


Mr 


X00737 


rfi if OT70 

TM4SF2 


L10373 


ALOX5 


tao s~ r\r\ 

J 03 600 


r T)A vTT> 

LRMr 


U10485 


r IrJNz 


A TOOOOOA 

AI828SS0 


A T AVC A T> 


A1S06222 


•AbrJr 1 


AF053944 


1 l(jrrBK2 


D506S3 


UDC1 


TV /fJ'in/Z A 

M33764 


JSIJJD2 


D86425 


ODC1 


XI 6277 


jCBXl 


U35451 


|CSF3R 


M59820 


IKIAA0172 


D79994 


IL1B 1M15330 ! 


KIAA0922 |AB023139 


LOC51097 !AA065dl8 


ITUBA1 


X06956 


ITGA6 


S66213 


NFKBIL1 


Y14768 


ADPRT 


J03473 


ADPRT 


J03473 


CSF3R 


M59818 


EFNB1 


U09303 


CD9 


M38690 


CDKN2D IU40343 


KIAA0442 


AB007902 


PRKCZ 


Z15108 




AF055029 


RECK 


D50406 


G0LGA3 


D63997 


1ZAP70 


L05148 


FLU 


M98833 


LASP1 


X82456 


IAJ001381 


TBXA2R 


D38081 


BHLHB2 


AB004066 


AD ARB 1 


U76421 


PTPN6 


X62055 
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1X58398 


TEMPI I 


D11139 


KIAA0554 U 


\B011126 J 


SRP14 i 


M525652 _~1 


ATP9A E 


\B014511 


HELOl 


\L034374 


GNAQ _J 


J43083 


POU4F1 T- 


X64624 


MERTK Ti 


U08023 


KTAA0625 IAB014525 




AB01 1 131 


TT '71? 


AF043129 


11 uAO 


-J ^/ O V/ 


TT rr> A 1 
1 U15/V1 


HG?259-HT2348 


|PTR121 


L47738 


MAGED1 i 


W26633 


CD48 


M37766 


TLR1 


AL050262 


NPR1 


X15357 


GLUL 


X59834 i 


DAPK1 


X76104 n 


j 


X58398 


ARHGEF4 


AB029035 _| 


NKEFB 


L19185 




AL049435 


ITM2A 


AL021786 


RAG2 


[M94633 






SCGF 


AF020044 


PREACB 


M34181 


KCNN4 


AF022797 _j 


KCNN1 


U69883 


MAPKAPK2 


U12779 


PIN 


IAI540958 j 


TOP2B 


X68060 


GATA2 


M68891 


IL1B 


X04500 _j 


PDE3B 


U3S178 


'DGKD 


D73409 


IKIAA0993 


AB023210 


AD AM 10 


AF009615 


IGLL1 


M27749 


PDLIM1 


U90878 


PRKAR1A 


M33336 


CD34 


IS539U 


GLA 


jU78027 
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IBAZ1B [AF072810 _ _H 
jEFNAl IM57730 
i~FADS3 ;AC004770 


IFLT3 


U026S7 


LOC5722S 


AF091087 


BCL6 


U00115 


BMP2 


M22489 


CD22 


X59350 


KIAA0429 


AB6~07889 


DKFZP434C171 


AL080169 


CTBP2 


AF016507 




M11S10 


SIAT9 


AB018356 


iCYBB 


X04011 I 


IAKR1B1 


X15414 j 


'nfkbili 


Y14768 1 


!UBE2V1 


U49278 


iDOC-lR 


AF089814 


BUB3 


AF047473 


IL7R 


M29696 


ACK1 L13738 ; 


ENIGMA IL35240 


KIAA1071 


AB028994 


IGL 


AI932613 


MN1 


X82209 


KIAA0823 


AB020630 


NFKB1 


M58603 


CD24 


L33930 


IYWHAQ IX56468 


VDAC1 [L06132 


P85SPR |D63476 


SYNGR1 


AL022326 


NDR 


Z35102 


JMJ 


AL021938 


PRSC1 


D55696 


MRC1 


M93221 




All 847 10 


CRIP1 


AI017574 


KIAA0056 


D29954 




AF039397 


IU79265 


SLAM 


U33017 


LYL1 


AC005546 


KIAA0620 


AB014520 


VDAC1P 


AJ002428 


ISRP9 


AF070649 
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prdxi : 


— 

X67951 


SLC9A3R1 


\F015926 


CD72 |i 


M54992 


ECM1 l 


U681S6 


PPP2R5A 


L42373 


HDGF 


D16431 1 


MERTK 


U08023 




L02326 


CD34 


M81945 


IL17R 


U58917 


ARL7 


AB016811 


P4HA2 


U90441 


BZRP 


M36035 


F13A1 


M14539 


KRAS2 


M54968 i 


BS69 


XS609S 


ORP150 1 


U65785 




D28915 


LEF1 


AL049409 


SH2D1A 


AL023657 


LY6E 


U66711 


FACVL1 


D88308 


EPB42 


M60298 




AL049471 


BMI1 


LI 3689 


IKCNJ13 


N36926 


|N33 


U42349 


1VIL2 


X51521 J 


iCCNG2 "U47414 


fClSORFl 


AF009425 


jNUMAl 


Z11584 


Idbni 


U00802 _j 


IFLT3 


'U02687 


'KIAA0854 


AB020661 


IMGC4175 


!AI65642i 


IKIAA1012 


AB023229 


IC1RBP 


r D78134 


iST5 


U15131 


iKIAAOOOl 


D13626 


CCR1 


D10925 


'CD19 


M28170 


SNRPE 


AA733050 


CR2 


IM26004 


HEXA 


'M16424 


IFIT4 


AF026939 




W26667 j 
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EPOR 


M60459 


TMSNB 'D82345 


GCLM 


L35546 


H41 


H15872 


TUBB2 


HG1980-HT2023 


TNFAJP2 


M92357 


GAB1 


U438S5 


PTPRK 


L77886 


'BCL7A 


X89984 



Table 56: Additional Genes selected by 

T statistics for Hyperdiploid >50 
Risk Group 



Gene symbol 


Accession Number 


SH3BP5 


AB005047 


FT 




MXl 


M33882 


NPY 


AI198311 


SOD1 


X02317 


PTPRK 


L77886 


IL1B 


X04500 


CD9 


M38690 


FLT3 


U02687 


PGK1 


V00572 


EFNB1 


U09303 


jFOS 


K00650 


jILIB 


M15330 


MRC1 


M93221 


HMG14 


J02621 


SNRP70 


X06815 


PDLIM1 


U90878 


ALOX5 


J03600 


RAG2 


M94633 


CALM1 


U12022 


KIAA1013 


AB023230 


NDUFA1 


N47307 


FOS 


V01512 


DXS1357E 


X81109 


ICSBP1 


M91196 


ETS2 


J04102 


PCDH9 


AI524125 


LILRA2 


AF025531 
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PSAP 


J03077 


"SCHIP-I 


AF070614 


|cCND2 


D13639 


IKCNN1 


U69883 _ 


AT TF 


AB018328 


IGFBP4 


U20982 


1VJ.27 


AB019392 




Y1S004 


LOC51632 






AF032456 




AF037989 


ATRX 


U72936 


APT6M8-9 1 


AL049929 


PTPRE 


X54134 


GILZ 


AI635895 


PECAM1 


AA1 00961 


ARHGEF4 


AB029035 


lECMl 


U68186 




Table 57: Additional Genes selected by 
T statistics for the MIX Risk Group 


Gene symbol 


Accession Number 


EPOR 


M60459 


CD44 


L05424 


PRKCH 


M55284 


MADH1 


U59423 


KLF1 


U65404 


MME 


J03779 


I PTPRK 


L77886 


IL1B 


X04500 


lYESl 


M15990 


ARPC2 


U50523 


IGFBP4 


M62403 


ITPR3 


U01062 




M13929 


EFNB1 


U09303 


IFHIT 


U46922 


NME2 


X58965 


CCND2 


X68452 


MPB1 


M55914 
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CDH2 


M34064 


IGFBP7 


L19182 


ALOX5 


J03600 


PTGDR 


U31099 


PLXNC1 


AF030339 


EIF3S2 


U39067 


BLVRA 


X930S6 


HSPC022 


W68830 




S67247 


MYLK 


U48959 


SLC6A11 


S75989 




X67098 


SERPINB1 


M93056 


ILGALS1 


AI535946 


HRK 


D83699 




AL049313 


HBS1L 


AB028961 


1KIAA0437 


AB022660 


GDI2 


Y13286 


TTGA4 


X16983 


EEF1B2 


X60489 


MD-1 


AB020499 


POU4F1 


X64624 


TST 


X59434 


PTPRF 


Y00815 


ARHGEF4 


AB029035 


[SCHIP-1 


AF070614 


ASMTL 


AA669799 


iDDRl 


L20817 


N33 


U42360 


CR2 


M26004 


AHNAK 


M80899 


SCGF 


AF020044 


EPB49 


U2S389 


■PSPHL 


AJ001612 


MADH1 


U59912 


ITPR3 


U01062 


DPEP1 


J05257 


AKAP12 


U81607 


DBI 


AI557240 


KIAA0736 


AB018279 


MAL 


X76220 


S100A4 


W721S6 


MDK 


X55110 


CRK 


D10656 
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CAPG 


M94345 


KCNH2" 

XV. V^" J- ~X XX^ 


U04270 


KIAA1069 


AB028992 


DKFZP564LUoo2 


AT ORfiOQI 


KIAA0298 


AB002296 


dgkeT 


D73409 

— j 


DEPP 


AB022718 




AL049957 


CD8B1 


XI 3444 


EFNB1 


U09303 




AI391564 


LDOC1 


AB019527 


(EFNA1 


M57730 


! CD44 


L05424 


iPTPRC 


Y00062 


IPTPRC 


Y00638 


PTPRC 


Y00638 


TFPI 


M59499 


TSPAN-5 


AF065389 


BCL11A 


W27619 




AJ0013S1 


KIAA1011 


AL080133 


|FYB 


U93049 


DKFZp761F2014 


AA149431 


FGFR1 


X66945 


M63589 


IPTPN6 


X62055 



Table 58: Additional Genes selected by > 
T statistics for the Novel Risk Group 


Gene symbol 


Accession Number 


|CHST2 


AB014679 


CLTC 


D21260 J 


[TUBAl 


X06956 


"GNG11 


U31384 


PCDH9 


AI524125 


MDS019 


AA442560 


RAG2 


M94633 


ITGA6 


X53586 J 


UBE2E3 


AB017644 


CD34 


S53911 


CD34 


M81945 


FGFR1 


M34641 
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ECMl ! 


U68186 


MADH1 


U59423 


FUT7 


ABUlzooo 


PROML1 


A T?/~\0 TO A o 


CSNK2A1 


TV if"C CO C 


FLNB 


Ar 042 1 oo 


MADH1 


U5yy 12 


jLIG4 ; 


VOI /I A 1 

X83441 


'ZNF151 


Y 09/2:5 


CSF3R 


M59818 


: . _ 


A-L08020;) 


ISTAU2 


a t mnoOiC 

AJ^U/y2oo I 


JAEBP1 


AF053944 


IKIAA0320 


A "OA AO "31 O 

AB00231o 


IKIAA0746 j 


A TDA1 OOQQ 


PTPRM ! 


VC OO O O 


IGFBP4 j 


M62403 


ZNF266 


A A O/COOAO 

AA868898 


PDLIM1 


T TA AOOO 

U90b7b 


MTMR3 


AB 0023 69 


IT1MP1 


D11139 


TTC2 


IT T^i Of Af 

W28595 


TM4SF2 


L10373 


PSA 


AA978353 


HTR4 


"\7"1 OC AC 

Y12505 


MMS19L 


A T7A AT 1 C 1 

AJb0071M 


j . - - 


AI39I5o4 


ITJP2 


T O T /I T/C 

L27476 


; BMP2 


M22489 


ARL7 


A T> r\ 1 /TO 1 1 

AB016811 


TLR1 


AL050262 


SMC2L1 


AF092563 


TGFBR2 


j DdOooJ 


TGFBR2 


T\f A /TOO 

D50683 


SPARC 


TAO Avf A 

J03040 


GPRK5 


L15388 


CDH2 


TV >ro Af\£L A 

M34064 


K1AA0S77 


A TiAOA/TO/1 

AB020684 


ABLIM 


D31883 


RNF3 


TT TO ^7/°"YO 

W25793 


'CCBP2 


T T/~\ yinoo 

U94888 


CHN2 


T T/\*70 0 O 

U07223 


TTO A A 

1TCjA4 


yVJLO^Oo 


IQGAP2 


U51903 


FLJ22531 


W80358 


|PIK3CD 


U86453 
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(FXYD2 [ 


.. —7 

H94881 


r 


W30677 


AMPD3 


U29926 




D78577 


KIAA0125 


D50915 


FADS3 


AC004770 


DKFZP434C171 


AL080169 


EST00098 


AI885170 


BMP2 


M22489 


LILRB4 


AF072099 


KIAA0429 


AB007889 


DKFZP586G0522 


AL050289 

. 




U92818 1 


ATIC 


D82348 


MONDOA 


AB020674 | 


CNK1 | 


AF100153 


NGFR 


Ml 4764 


KIAA0540 


AB011112 


IMYO10 


AB018342 


PIASX-BETA 


AF077954 


ACVR1 


Z22534 


ARHGEF10 


AB002292 


[PON2 


AF001601 


(TST 


X59434 


jsPTBNl 


M96803 


•ERCC2 


AA079018 _j 


PRSC1 


D55696 


DKFZP434D174 


AL080150 




All 847 10 


CD8B1 


XI 3444 




U79265 


DKFZp761F2014 


AA149431 


MEF2A 


1 U49020 


JAG2 


AF029778 


ZNF143 


AF071771 


CASP1 


U13697 


HAP1 


AF040723 


FABGL 


D82061 


ALDH1 


K03000 


RAD9 


U53174 




AL1 09722 


CDC27 


AA1 66687 


B4GALT1 


D29805 
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PTPRM 


X58288 


AHR 


L19872 


N33 


U42349 


IL12RB2 


U64198 


MTR 


U73338 


KIAA0697 


AB014597 


CSNK2B 


M30448 


_ _ 


U15590 




W2S612 


HSU79253 


AF052186 


RBBP1 


S57153 


S100A11 


D38583 


;TCFl2 


M80627 




AI971169 


EEF1E1 


N32257 


SAP18 


AW021542 


Ipvrli 


AF060231 




M13929 1 


MKP-L 


AF038844 


i W26667 


CD79B 


M89957 


KIAA0437 


AB022660 




AF070633 


GCLM 


L35546 


|EDG6 


AJ000479 


|mal 


X76220 




Table 59: Additional Genes selected by 


T statistics for the T-ALL Risk Group 


Gene symbol 


Accession Number 


SLP65 


AF068180 


CD3D 


AA919102 


.SH2D1A 


AL023657 


1CD79B 


M89957 


CD3E 


M23323 


CTGF 


X78947 


'PFTKl 


AB020641 


TRB 


X00437 


CD24 


L33930 


ICD22 


X527S5 1 


ITOP2B 


X68060 


jCD22 


X59350 


iTCLlA 


X82240 


Ibrag 


AB011170 


CD79A 


U05259 


SCHIP-1 


AF070614 
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MAL _ _ 
HLA-DQB1 
PDE4B " 



HLA-DQB1_ 
CD 19 



X76220_ 
JM16276 
L20971 



M60028_ 
M28170 



LILRA2 


AF025531 


PTPN1 8 


X79568 


MEF2C 1 


L08895 


PTP4A2 


U14603 


(npy 


AI198311 


GAB1 


U43885 


lck 


U23852 J 


TCF7 


X59871 


TERF2 


X93512 


|ITM2A 


AL021786 _j 


|MEF2C 
[SLC9A3R1 


S57212 


AF015926 1 


|ENG 


X72012 J 

— . — 


DEPP 


AB022718 


,IL1B 1 


X04500 1 


IL1B _i 


M15330 


ECM1 


U68186 


HLA-DMA 


X62744 


CRMP1 


D78012 


WFS1 


AF084481 


Iprkcq 


L01087 


GNG7 


AB010414 




X58398 


CDKN1A 


U03106 


CD9 


M38690 


PTK2 


L13616 i 


jTPxB 


M12886 


EFI35 


L78833 


NUCB2 


X76732 


KIAA0942 


AB023159 


VATI 


U 18009 


ARL7 


AB016811 


IUSP20 


AB023220 


PLCG2 


X14034 


PRDX1 


X67951 


POU2AF1 


Z49194 


CMAH 


D86324 


ALOX5 


J03600 


PTPN7 


M64322 


MEF2C 


S57212 
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KIAA066S 


AL021707 


LOC54103 


AL079277 


EFNB1 


U09303 


HELOl 


AL034374 


ADF 


S65738 


KIAA0906 


AB020713 


IGFBP4 


U20982 


LDHB 


X13794 


CTNNA1 


U03100 


EN02 


X51956 


LAT 


AJ223280 


PTPN7 


D11327 




Ml 6942 


CSRP2 
GLA 


U57646 


U78027 


ADA 


X02994 


'"rgsio 


AF045229 


KIAA0870 


AB020677 1 


CD3Z 


J04132 


1STATI2 


AF037989 


GSN 


X04412 ! 


1NSR 


X02160 


HLA-DNA 


M31525 


CD72 


M54992 


r EPHB6 


D83492 


MYLK 


U48959 


HLA-DQA1 


AA868382 


LCK 


M36881 


'FHL1 


AF063002 


CRIM1 


AI651806 


IAQP3 


N74607 


jHLA-DQBl 


M81141 


GNG11 


U31384 


Ilarge 


AJ0075S3 


FOXOIA 


AF032885 


NPR1 


X15357 


GAB1 


U43S85 


PTPRE 


X54134 


PDLIM1 


U90878 


NCF4 


AL008637 


ARHGEF4 


AB029035 


PTP4A2 


U14603 


CTNNA1 


AF 102803 


SEPW1 


U67171 


CHI3L2 


U58515 


LILRA2 


U82277 | 
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CD79A _J_ 


— — — j 

U05259 


TCL1B 


AB018563 


|TCF4 


M74719 


TACTILE 


M88282 




AB002438 


txn 


AI653621 


ADE2H1 


X53793 

AL049449 




GLUL 


X59834 


ZFHX1B 


AB011141 


P4HB 


M22S06 


jlFITMl 


J04164 


KIAA0182 


D80004 


SH2D1A 


AF100539 


GNA11 


M69013 " ! 


NCF4 


AL00S637 


SSLC2A5 


M55531 


|KIAA0993 


AB023210 


(HLA-DPB1 


M83664 


jHLXl 


M60721 


CTNNA1 


D 14705 


FADS3 


AC004770 


GATA3 


X58072 


GDI2 1 


Y13286 


TM4SF2 


L10373 


GNA15 


M63904 


BTG2 


U72649 


RAG1 


M29474 


|MDK 


X55IT0 




X00457 


AKR1C3 


D 17793 


SLA 


t D89077 


(LDHA 


X02152 _j 




AL049279 


PTPRC 


Y00638 


BMP2 


M22489 


ERG 


M17254 


ICSBP1 


M91 196 


CCT2 


AF026166 


IAKAP2 


J AB023137 


1 


X58398 


iKIAA0128 


D50918 


IGHM 


X58529 


1n6tch3 


U97669 


JUP 


M23410 


DKFZP586Q1624 


AL039458 
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MYO10 


AB018342 


jCTNNAl 


L23805 


NOS2A 


U31511 




D00749 




L29376 


ICB-1 


AF044896 


GNAI1 


AL049933 


SlOOAll 


D38583 


MAPKAPK3 


U09578 


[ADA 


M13792 


IS100A13 ~l 


AI541308 


|VDAC3 


AF038962 ! 


i 


AL049265 


TRIM ; 


AJ224878 


CTBP2 ' 


ArUl ODKJ / 


F13A1 


M14539 ~l 


ZNF43 _j 


HG620-HT620 | 


DKFZp761F2014 


AA149431 


KIAA0442 ~ 


AB007902 


jCTNNAl 


U03100 


CD2 


M16336 


BMP2 


M22489 


HSPC022 


W68830 


ICAM3 


X69819 


NCF4 


X77094 


GS3955 


D87119 


CTSC 


X87212 


GH1 


V00520 


ARPC2 


U50523 


HLA-DRB1 


M32578 


GAS1 


L13698 


LAMB2 


M55210 


EPHB4 


U07695 


COX8 


AI525665 


KIAA0618 


N29665 


KIAA0S7O 


AI80S958 


P1K3CG 


X83368 


IGHD 


1 K02882 


ERF4 


U52682 


jHSPCB 


' Ml 6660 


1CAPN3 


X85030 


CD6 


X60992 


WSX-1 


AI2638S5 


FXYD2 


H94S81 


PTK2 


HG3075-HT3236 
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FUCA1 


M29877 


|FADS2 


AL050118 


(KARS 


D32053 


DSCR1 


U85267 


SOX4 


X70683 


TRD 


X73617 


MHC2TA 


U18259 




AL049435 


.. , - 

MDK 


M94250 


CALM1 


U12022 


PCLO 


AB011131 




AI391564 


FHIT 


U46922 


MONDOA 


AB020674 


^TRG 


M30894 


SPIB 


X66079 


IFLJ10097 


AL035494 


:TAGLN2 


D21261 


ILGALS9 


Z49107 



Table 60: Additional Genes selected by 
T statistics for the TEL-AML1 Risk 
Group 


Gene symbol 


Accession Number i 


ARHGEF4 


AB029035 


JTNFRSF7 


M63928 


'PCLO 


AB011131 


TGFL5 


A.P0.1-0.1 OA 


KCNN1 


U69883 


NME2 


X58965 


PTPRK 


L77886 




AL049313 


TERF2 


X93512 


GNG11 


U31384 


RAG1 


M29474 




AL080190 


MADH1 


U59423 




HG3523-HT4899 


'madhi 


U59912 


P114-RHO-GEF 


AB011093 




L29254 


MDK 


M94250 


TERF2 


AF002999 


[CRMPl 


D78012 
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HLA-DOB 


X03066 


NFKBIL1 


Y14768 




- AA216639 




AL080059 


CBFA2T3 


AB010419 


MDK 


X55110 


PDC3C3 


Z46973 


ALOX5 


J03600 


PTP4A3 


AF041434 


1POU2AF1 


Z49194 


POU4F1 1 


L20433 


PRKCB1 


X07109 j 


GCAT 


Z97630 


'PHYH 


AF023462 


SPTA1 


M61877 1 


jIDIl 


XI 7025 


IFYB 


U93049 _j 


ITPR1 


D26070 


GTT1 


AL041780 


|F ADS 3 


AC004770 _ 


CCT2 


AF026166 


ISG20 


U88964 


SCHIP-1 


h AF070614 


DR6 


AF068868 


MYO10 


AB01S342 


ZNF91 


LI 1672 J 


T-STAR 


AF051321 


FUCA1 


M29877 


HLA-DQB1 


M60028 




AB002438 


CTGF 


X78947 


FKBP1A 


M34539 


i 


AI391564 ' 


iPvABl 


AL050268 


JNSR 


X02160 _ 


pOAA0540 


AB011112 


TM4SF2 


L10373 


CASP1 


MS7507 


MT1L 


AA224832 


MME 


J03779 




AI743299 


KARS 


D32053 


ICHN2 


U07223 


IQGAP2 


U51903 


KIAA0906 


AB020713 


STATE 


AF037989 
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HLA-DMA 


X62744 


ICD36L1 ' 


Z22555 


PRKCB1 1 


X063i'S 


GS3955 


D87119 


ACTN1 


X15804 


FLJ20154 


AF070644 


KIAA0769 ' 


AB018312 


SDC1 


Z48199 


[SOX4 


X706S3 


NRTN 


U78110 


CTNND1 


AB002382 


FHIT 


U46922 


FARP1 


AI701049 1 


FOXOIA 


AF032885 


NPY 


AI19S311 


VDUP1 


S73591 


'H2AFO 


AI885852 


TACTILE 


M88282 


SNL 


U03057 I 


Ijup 


M23410 


NR3C2 


M16801 


PRPS2 


Y00971 


LILRA2 


AF025531 


RNAHP 


H68340 1 


DPYSL2 


U97105 __j 


ITGB2 


Ml 5395 


PCDH9 


AI524125 


LAIR1 


AF013249 1 


CD79A 


TJ05259 


NFKBIL1 


' Y 14768 


PCCA 


1 S79219 


HLA-DMB 


U15085 


SMARCA4 


I D26156 



RX AMPLE 2 

To identify additional additional genes whose expression levels could be used 
as a diagnostic tool to identify ALL subgroups, leukemic blasts from 132 diagnostic 
samples were analyzed using higher density oligonucleotide arrays that allow the 
interrogation of a majority of the identified genes in the human genome. 

A subset of the 327 diagnostic pediatric ALL samples described above were 
reanalyzed using these higher density microarrays. Case selection was based on 
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providing a representation of the known prognostic ALL subtypes including 
t(9;22)[BCR-ABL], t(l,\9)[E2A-PBXl], t{\2;2\)[TEL-M4Ll], rearrangement in the 
MLL gene on chromosome 1 lq23, and hyperdiploid karyotype with >50 
chromosomes. Since the goal was to define expression profiles that could be used to 
5 accurately diagnose the known prognostic subtypes of ALL, we chose to over 

represent these subtypes compared to what is normally seen in a random population of 
childhood leukemia patients. A total of 132 samples met these criteria and had 
sufficient material remaining to be used for this analysis. The list of samples and 
subtype distribution of the cases used in this study are shown in Tables 61 and 52, 
10 respectively. 



Table 61. Diagnostic ALL samples used for class prediction (n=132) 



BCR-ABL-#1 


Hyperdip>50-C18 


Pseudodip-#6 


BCR-ABL-#2 


Hyperdip>50-C21 


Pseudodip-C2-N 


BCR-ABL-#3 


Hyperdip>50-C22 


Pseudodip-C3 


BCR-ABL-#4 


Hyperdip>50-C23 


Pseudodip-C5 


BCR-ABL-#5 


Hyperdip>50-C27-N 


Pseudodip-C6 


BCR-ABL-#6 


Hyperdip>50-C32 


Pseudodip-C7 


BCR-ABL-#7 


Hyperdip>50-R4 


Pseudodip-C9 


BCR-ABL-#S 


Hyperdip47-50-C14-N 


Pseudodip-C14 


BCR-ABL-#9 


Hyperdip47-50-C3-N 


Pseudodip-C16-N 


BCR-ABL-Hyperdip-# 1 0 


Hypodip-#2 


Pseudodip-Rl-N 


BCR-ABL-C1 


Hypodip-2M#l 


T-ALL-#5 


BCR-ABL-R1 


Hypodip-C2 


T-ALL-#6 


BCR-ABL-R2 


Hypodip-C5 


T-ALL-#7 


BCR-ABL-R3 


MLL-#1 


T-ALL-#8 


BCR-ABL-Hyperdip-R5 


MLL-#2 


T-ALL-#10 


E2A-PBXl-#5 


MLL-#3 


T-ALL-C2 


E2A-PBXl-#6 


MLL-#4 


T-ALL-C6 


E2A-PBXl-#9 


MLL-#5 


T-ALL-C7 


E2A-PBX1-#10 


MLL-#6 


T-ALL-C11 


E2A-PBX1-#12 


MLL-#7 


T-ALL-C15 
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E2A-PBX1-#13 


A/TT T UR 
lvll-J-i-rro 


T-ALL-C19 


O A T»T"» A/" 1 OA >f 1 

E2 A-PBX 1 -2 M# 1 


A/TT T _0A/f#1 


T-ALL-C21 


E2 A-PBX 1-C2 


A/fT T 9A/T&9 


T-ALL-R5 


E2 A-PBX 1 -Co 


A/TT T ni 


T-ALL-R6 


E2 A-PBX 1-C4 


A/fT t no 


TEL-AMLl-#6 


E2A-PBX1-C5 


A/FT T -PI 


TEL-AMLl-#9 


E2A-PBX1-C6 




TEL-AML1-#10 


E2A-PBX1-C7 




TEL-AML1-#14 


E2A-PBX1-Cy 




TEL-AML1-2M#1 


E2 A-PBX 1 -C 1 U 


A/fT T -U 1 


TEL- AML1 -2M#2 


■t-> /-» a TlDVI 1 1 

E2A-PBX1-CI1 


AAT T JR? 


TEL-AML1-C4 


E2 A-PBX 1 -CI 2 




TEL-AML1-C5 


xiZ/v-Jrr5A. 1 -±v 1 


MLL-R4 


TEL-AML1-C6 


rlyp eraip^D u-tFo 


Nonrial-Cl-N 


TEL- AML1 -C26 


riyperurp^ou-Tfiz 


Normal- C2-N 


TEL-AML1 -C28 


rlyp eraip^D u-7? i h- 


Normal-C3-N 


TEL-AML1 -C30 


Jtlyp erarp^o u-v^ i 


Normal-C4-N 


TEL-AML1-C31 


rlyp eraip>3 


Nornial-C7-N 

_L> UlllliU V — ' / X^l 


TEL-AML1 -C32 


rlyperaip>D u-l^o 


Normal-C8 

i N KJL JLX1.CU V_/*J 


TEL-AML1-C33 


jiyperaip>:> u-L^o 


Normal-C9 

_L N v / 1 1114-11 


TEL-AML1 -C34 


rlyperaip>!>u-ui i 


Normal -PI 1-N 


TEL-AML1-C37 


xiyperaip-^D u-v^ i d 


Normal-Rl 


TEL-AML1 -C3 8 


rlyperaip>D u-c 1 d 




TEL-AML1 -C40 


Hyperdip>50-C16 


Pseudodip-#5 


TEL-AML1-R3 



* Subtype Name-C# Dx Sample of patient in CCR 

Subtype Name~R# Dx Sample of patient who developed a hematologic relapse 
Subtype Name-# Dx Sample used for subgroup classification only 
Subtype Name-2M# Dx Sample of patient who later developed 2nd AML 
5 Subtype Name-N Dx Sample in novel group 
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Table 62. Subgroup distribution of ALL cases 


Subgroup 


Train Set 


Test Set 


BCR-ABL 


11 


4 


E2A-PBX1 


13 


5 


Hyperdiploid >50 


13 


4 


MLL 


15 


5 


T-ALL 


12 


2 


TEL-AMLl 


15 


5 


Other 


21 


7 


Total 


100 


32 



26,825 probe sets from combined Affymetrix® brand U133A and B 
5 microarrays (Affymetrix, Inc., Santa Clara, CA) showed variation in expression levels 
across the 132 diagnostic leukemia samples. In an initial analysis of these data, two 
complementary unsupervised clustering algorithms: two-dimensional hierarchical 
clustering and principle component analysis (PC A), were used to assess the major 
sub-groupings of the leukemia cases based solely on gene expression profiles. These 

10 unbiased clustering algorithms demonstrated that the pediatric ALL cases cluster 
primarily into seven major subtypes: T-ALL and 6 subtypes of B-cell lineage ALL 
corresponding to (1) rearrangement in the MLL gene on chromosome 1 lq23, (2) 
t(l;19)[E2A-PBXl], (3) hyperdiploid >50 chromosomes, (4) t(9;22)[BCR-ABL], (5) 
the novel subgroup, and (6) t(12;21)[TEL-AMLl]. In addition, a heterogeneous group 

15 of B-lineage cases were identified that lacked any of the defined genetic lesions and 
failed to cluster into the novel subgroup. Several of these leukemia subtypes formed 
distinct branches when all differentially expressed genes were used in the two- 
dimensional hierarchical clustering algorithm (T-ALL, Hyperdiploid >50 
chromosomes, and TEL-AMLl), whereas other subtypes clustered in multiple 

20 branches, suggestive of gene expression differences within these subclasses. Using 
PC A, the distinct nature of the B-cell lineage subtypes is better appreciated when the 
T-ALL cases were removed from the analysis. A diagnostic accuracy of 100% was 
achieved for two of the leukemia subtypes (T-ALL and TEL-AMLl), indicating the 
need to use supervised learning algorithms to achieve optimal diagnostic accuracy by 

25 gene expression profiling. 

Statistical methods were used to identify probe sets that were the best 
discriminators of the individual leukemia subtypes. In order to identify the genes that 
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provide the highest accuracy in diagnosing specific prognostic subtypes of leukemia, 
the decision tree format described elsewhere herein was used for the identification of 
leukemia subtypes. Briefly, we first defined whether a case is T- or B-cell in lineage. 
If the case is classified as T-cell, a diagnosis of T-ALL is made. If non-T, we then 
5 determine if the case can be classified into one of the known B-cell lineage risk 

groups, deciding sequentially if it is E2A-PBX1 , TEL-AML1, BCR-ABL, rearranged 
MIX gene, and lastly hyperdiploid with >50 chromosomes. Cases not assigned to one 
of these classes are left unassigned. The use of this decision tree format directly 
influences the selection of genes, allowing the selection of discriminating genes for 
1 0 groups lower down the tree that might also be expressed by subtypes higher in the 
tree. Using a number of different supervised learning algorithms, it was found that a 
higher diagnostic accuracy is obtained using this decision tree format, as compared to 
a parallel format in which each class is identified against all others. 

Discriminating genes were selected using a chi-square metric on the 100 cases 
15 in the training set. Genes were selected that discriminated between a class and all 

leukemia subtypes below it in the decision tree. The number of discriminating probe 
sets per leukemia subtype at a statistical significance level of p < 0.001 (as determined 
by a permutation test) were: T-ALL, 2063; E2A-PBX1, 1059; TEL-AML1, 805; 
BCR-ABL, 201; MLL chimeric genes, 726; and hyperdiploid with >50 chromosomes, 
20 994. The lists of discriminating genes obtained using the top 100 ranked probe sets for 
the six prognostically important subgroups are contained in Tables 63-68. As multiple 
probe sets for the same gene are present on Affymetrix microarrays, the top 100 
ranked probe sets represent between 75 and 92 distinct genes, depending on the 
leukemia subtype. As shown, distinct groups of either over or under expressed genes 
25 distinguish cases defined by E2 A-PBX1 , MLL gene rearrangement, T-ALL, 
hyperdiploid >50 chromosomes, BCR-ABL, and TEL-AML1. 

The following tables contain a list of the top 100 probe sets for each diagnostic 
subtype, ranked by their chi-square value. Each table contains the Affymetrix® U133 
series probe set number, a gene description, gene symbol, chromosomal location, and 
30 primary GenBank reference. Chi-square values were calculated utilizing only the 
samples in the train set in a differential diagnosis decision tree format. The 
calculation of the fold change was done in a parallel format using the total data set 
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and comparing the mean signal value in the class versus the mean signal value in 1 
non-class. 

Table 63. Top 100 chi-sqiiare probe sets selected for BCR-ABL 

Bcr 

Chromo- Chi- above/ 





U133 probe 




Gene 


soma! 


GenBank 


square 


below 


r oiu 




set 


Gene description 


symbol 


location 


Reference 


value 


mean 




1 


241812 at 


EST FLJ39877 


FLJ39877 


2 


AV648669 


47.4 


Above 


D.J. 


2 


201S76_at 


Paraoxonase/ 


PON2 


7q21.3 


NM_000305.1 


47.2 


Above 


1 C 1 
lo. / 






arylesterase 2 










Above 


J..O 


3 


201028_s_at 


Antigen identified 


MIC2 


Xp22.32 


U82 164.1 


44.3 






by monoclonal 


















antibodies 12E7, 


















F21 and013 












O.D 


4 


200953_s_at 


CyclinD2 


CCND2 


12pl3 


NM 001759.1 


42.3 


Above 


5 


202947_s__at 


Glycophorin C 


GYPC 


2ql4-q21 


NM_002101.2 


42.3 


Above 


3.1 






integral membrane 


















glycoprotein 










Above 


4.3 


6 


223449 at 


Semaphorin 6A 


SEMA6A 


5q23.1 


AF225425.1 


42.3 


7 


201029_s_at 


Antigen identified 


MIC2 


Xp22.32 


NMJ)02414.1 


41.2 


Above 


2.4 






by monoclonal 


















antibodies 12E7, 


















F21 and 013 














S 


204429_s_at 


Solute carrier 


SLC2A5 


lp36.2 


BE560461 


41.2 


Above 


D 






family 2 


















(facilitated 


















glucose/fructose 


















transporter), 


















member 5 












15. o 


9 


210830 s at 


Paraoxonase 


PON2 


7q21.3 


AF00 1602.1 


41.2 


Above 


10 


215028 at 


Semaphorin 6A 


SEMA6A 


5 


AB002438.1 


41.2 


Above 


4.5 


11 


220024_s_at 


Periaxin 


PRX 


19ql3.13 


NM_020956.1 


41.2 


Above 


5.2 










-ql3.2 








43.4 




jL,\J±y\JKJ o al. 




HYA22 


3p21.3 


NM 005808.1 


41.1 


Above 


13 


209365ls_at 


Extracellular 


ECM1 


lq21 


U65932.1 


41.1 


Above 


6 






matrix protein 1 










Above 


10.9 


14 


2386S9_at 


GPR110G 


GPR110 


6 


BG426455 


41.1 






protein-coupled 


















receptor 110 










Above 


12.4 


1 j 


zzz 1j4_s ai 


DKFZP56 


2q33.1 


AK002064.1 


40.4 






DKPZP564A2416 


4A2416 
















unknown protein 


















with a histone H5 


















signature. 










Above 


1.5 


16 


218084_x_at 


FXYD domain- 


FXYD5 


19ql2- 


NM_014164.2 


38 






containing ion 




ql3.1 














transport regulator 














17 


212242_at 


5 

Tubulin, alpha 1 


TUBA1 


2q36.2 


AL565074 


37 


Above 


3.2 






(testis specific) 










Above 


10.8 


18 


201445 at 


Calponin 3, acidic 


CNN3 


Ip22-p21 


NM 001839.1 


36.3 


19 


20277 l_at 


KIAA0233 gene 




16q24.3 


NM_014745.1 


36.3 


Above 


1.9 






product 


K1AA023 
3 












20 


212298_at 


Neuropilin 1 


NRP1 


10pl2 


BE620457 


36.3 


Above 


13.8 
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21 212458_at 

22 22248S_s_at 

23 222762_x_at 

24 20095 l_s_at 

25 204430 s at 



26 205467_at 

27 225660_at 

28 225913 at 



29 236489_at 

30 240173_at 

31 240499_at 

32 201310 s at 



33 215617_at 

34 242579_at 

35 202717 s at 



36 205055 at 



37 217967_s_at 

38 201656_at 

39 207196_s_at 

40 219315_s_at 

41 202123 s at 



42 219938_s_at 

43 228046_at 

44 64064_at 

45 222729 at 



FLJ21897 
Dynactin 4 
LIM domains 
containing 1 
Cyclin D2 
Solute carrier 
family 2 
(facilitated 
glucose/fructose 
transporter), 
member 5 
Caspase 10 
Semaphorin 6A 
FLJ21140 
(Ser/Thr protein 
kinase) 
EST 
EST 
EST 

P311 protein. 
Similar to 
gastrin/ cholecy sto 
kinin type B 
receptor. 
FLJ11754 
EST 

CDC16cell 
division cycle 16 
homolog 
Integrin, alpha E 
(antigen CD 103, 
human mucosal 
lymphocyte 
antigen 1) 
Chromosome 1 
QRF 24 

Integrin, alpha 6 
Nef-associated 
factor 1 
hypothetical 
protein FLJ2305S 
V-abl Abelson 
murine leukemia 
viral oncogene 
homolog 1 
Pro-Ser-Thr 
phosphatase 
interacting protein 
2 

EST;DKFZp434P 
0235 
Immune 
associated 
nucleotide 4 like 1 
F-box and WD-40 
domain protein 7 
(archipelago 
homolog, 
Drosophila) 



FLJ21897 2 AW138902 
DCTN4 5q31-q32 BE218028 
LIMD1 3p21.3 AU144259 

CCND2 12pl3 NM_001759.1 
SLC2A5 lp36.2 NMJ)03039.1 



CASP10 2q33-q34 NMJXH230.1 
SEMA6A 5q23.1 W92748 
FLJ21140 15 AK025943.1 



P311 



6 
4 
10 

5q21.3 



FLJ11754 2 
4 

CDC 16 13q34 



AI282097 
AI732969 
AA482221 
NM 004772.1 



AU145711 
AA935461 
NM 003903.1 



36.3 


Above 




36.3 


Above 


3.6 


36.3 


Above 


2.6 


35.3 


Above 


12.7 


35.3 


Above 


5.1 


35.3 


Above 


3.6 


35.3 


Above 


3.3 


35.3 


Above 




35.3 


Above 


16.7 


35.3 


Above 


10.3 


35.3 


Above 


1. 3 


35.2 


Below 


2.2 


35.2 


Above 


14.4 


35.2 


Above 


10.2 


34.4 


Above 


l.l 



ITGAE I7pl3 NM_002208.3 34.4 Below 2.1 



Clorf24 lq25 



AF288391.1 34.4 Above 



ITGA6 2q31.l NM_0002l0.l 33.9 

NAFl 5q32- NM_006058.l 32.2 
q33.l 

FLJ20898 I6pl3.12 NM__024600.1 32.2 

ABLl 9q34.l NM_005l57.2 31.4 



Above 
Above 

Above 

Above 



PSTPIP2 I8ql2 NM_024430.l 31. 2 Above 



DKFZp4 4 AA741243 31.2 Above 

34P0235 

IAN4L1 7q36 AI435089 30.9 Above 



3.2 

2.8 
1.4 

5.3 

1.8 



1.1 

3.3 



FBXW7 4q31.23 BE551877 30.5 Above 2.4 
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46 


229975 at 


EST 




4 


AIS26437 


30.5 


Above 


9.1 


A *7 

47 


200864 s at RAB11A 


T) A D 1 1 A 

KABl 1A 


1 jqzl.3- 


JNM_UU4 003.1 


9Q 7 

zy. / 


ADOVe 


1 A 
1 .4 










n 09 11 










48 


203089_s_at 


Protease, serine, 


PRSS25 


2pl2 


NM_0 13247.1 


29.7 


Above 


1.7 


A A 

49 


205376_at 


25 

Inositol 


1JNPP413 


4q3 1 . 1 


JNM_U03c5oo.l 


zy. / 


Above 


1Z.4 






polyphosphates- 


















phosphatase, type 
II 














rn 
DV 


209229_s_at 


T^T A A 1 1 1 C 

KJLAA1 115 




1 Q^. 1 Q /lO 

iyqi3.4Z 


Jt5LA/uz / yy. i 


9Q 7 

zy. / 


Above 


1 .3 






protein 


JUAA1 1 1 

C 












C 1 

51 


219871_at 


Hypothetical 


3 

r?T Ti-ji oo 
rU 1319/ 


4pl4 


JN JV1_UZ40 14.1 


70 7 

zy. / 


Above 


14.3 






protein FLJ13197 














CO 


222868_s_at 


Interleukin 18 


TT 1 QDT) 


1 1 n 1 1 

1 iqi D 


A TCO 1 C40 
A13ZlD4y 


OO 7 

zy. / 


AU 

Above 


7 1 
/ . 1 






binding protein 














53 


2359S8_at 


GPR110 G 


GPR110 


6pl2.3 


AA74603S 


29.7 


Above 


15.8 






protein-coupled 


















receptor 110 














54 


239273_s_at 


Matrix 


TV /T\ /f "DO C 


1 1 1 

1 /ql 1- 


AlVZ / ZUo 


OA n 

zy. / 


A t~ 

Above 


OA C 






metalloproteinase 
28 




„o i i 
qzi .1 










55 


206150 at 


Tumor necrosis 




12pl3 


\tx <r AA1 T/l^ 1 

NM_00 1242.1 


Z9.5 


Above 


Q O 

3.z 






factor receptor 


1 JNr±*Lo.r / 
















superfannly, 


















member 7 














56 


212203_x_at 


Interferon induced 


TT7TT1V >ro 


oqli.l 


0X0005/4 / 


oo c 
zy.D 


Above 


9 ^ 
Z.J 






transmembrane 


















protem 3 














57 


217110_s_at 


Mucin 4 


TV ATT T/~*A 

MUL4 


3qZ9 


A TO/fOC/n 1 

AJZ4ZD4/.1 


oo c 
Zy.D 


Above 


/lO c 
4 /.3 


CO 

5b 


223075_s_at 


hypothetical 


rL,J 1Z /o3 


yq34.13- 


A T 1 KC/^/^ 1 
AlwloODOO.l 


OQ C 

zy.D 


Above 


Q o 






protein FLJ12783 




q34.3 










59 


2291 3 9 at 


EST 




o 
5 


AlzOzzOl 


OA C 


Above 


in o 
lU.o 


60 


229367__s_at 


Hypothetical 


rUzzo9U 


/ 


AW 130o3o 


OO c 


Above 


3.0 






proteins 


















FLJ22690. 














/CI 

ol 


213093 at 


FLJ30869 




xqzo 


AT/171 a'TC 
A14 I ID ID 


OQ 1 

zy.i 


Above 


9 C 


62 


216033_s_at 


FYN oncogene 


FYN 


6 


S74774.1 


29.1 


Above 


2.7 






related to SRC 
















2023 69_s at 


r|iTTi a A A 1*1 

TRAM-hke 


V T a A AAC 

KJAAUU5 


opz 1.1- 


JNiVL_UlZZcS5. 1 


OQ 7 
Zo. / 


Above 








protein 


7 


plz 










64 


212592_at 


immunoglobulin J 


lLrJ 


4qzl 


AV /33ZOO 


OO 7 
Zo. / 


Above 


•7 O 

/.y 






polypeptide, linker 
















protein for 


















unmunoglobulin 


















alpha and mu 


















polypeptides 














/CC 

65 


219218_at 


hypothetical 




1 /qzj.3 


JNJVl_UZ40VO. 1 


OO O 
Zo. / 


Below 


A 9 

o.z 






protem FLJ2305S 














66 


242051 at 


EST 




Y 


AI695695 


ZO. / 


Above 


Z.Z 


67 


20065 5_s_at 


Calmodulin 1 


a t TV vT 1 

CALM1 


1 /I ^ O /I 

14q24- 


NM_UUoooo. 1 


OO c 
Zo.D 


Above 


1 1 
l.o 






(phosphorylase 




q31 














kinase, delta) 














Oo 


202794_at 


Inositol 


rXTT>T> 1 

AiNJrJr 1 


zq,5Z 


INI VI UUZ IV^t.Z 


Zo.*+ 


Above 


1 

1 .o 






polyphosphate- 1 - 
















phosphatase 














69 


218348 s at 


HSPC055 protein 


HSPC055 


16pl3.3 


NM 014153.1 


27.7 


Below 


1.1 


70 


205269_at 


Lymphocyte 


LCP2 


5q33.1- 


AI123251 


26.9 


Above 


1.6 



c^osolic protein 2 qter 
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71 238488 at 



72 202242 at 



73 218764_at 

74 22481 l_at 

75 225799_at 

76 228297_at 

77 203508 at 



78 20807 l_s_at 

79 20932 l_s_at 

80 226345_at 

81 200863_s_at 

82 205270_s_at 

83 20888 l_x_at 

84 212862_at 



Ran binding 
protein 1 1 

Transmembrane 4 
superfamily 
member 2 
Hypothetical 
protein MGC5363 
FLJ30652 
Hypothetical 
protein MGC4677 
Calponin 3, acidic 
Tumor necrosis 
factor receptor 
superfamily, 
member IB 
Leukocyte- 
associated Ig-like 
receptor 1 
Adenylate cyclase 
3. 

DKFZp43401317 





5ql2.2 


BF511602 


26.9 


Above 




LOC5119 










4 

TM4SF2 


Xqll.4 


NM_004615.1 


26.6 


Above 


1.7 




14q22.1- 


NM_024064.1 


26.6 


Above 


1.7 


MGC5363 
FLJ30652 


q22.3 
3 

2ql2.3 


BF 11 2093 
BF209337 


Zo.o 
26.6 


Above 
Above 


2.2 


MGC4677 
CNN3 


Ip22-p21 
lp36.3- 


AI807004 
NM_00 1066.1 


26.6 
26 


Above 
Above 


4.7 
2.6 


TNFRSF1 
B 


p36.2 










LAIR1 


19ql3.4 


NMJ)21708.1 


26 


Above 


2 


ADCY3 


2p24-p22 


AF033861.1 


26 


Above 


2.1 




10 


AW270158 


26 


Below 


1.4 



85 213385_at 

86 218013__x_at 

87 218966_at 

88 200742 s at 



89 203217__s_at 

90 205259 at 



91 220684_at 

92 225244 at 



RAB 11 A, member 
RAS oncogene 
family 
Lymphocyte 
cytosolic protein 2 
Isopentenyl- 
diphosphate delta 
isomerase 
CDP- 

diacylglycerol 
synthase 

cytidylyltransferas 
e)2 

Chimerin 2 
Dynactin 4 
Myosin 5C 
Ceroid- 
lipofuscinosis, 
neuronal 2, late 
infantile (Jansky- 
Bielschowsky 
disease). A 
pepstatin- 
insensitive 
lysosomal 
peptidase. 
Sialyltransferase 9 
Nuclear receptor 
subfamily 3, 
group C, member 
2 

T-box21 
BVIAGE3451454: 
GRASP protein 



DKFZp43 
401317 

RAB11A 15q21.3- 
q22.31 

LCP2 5q33.1- 
qter 

IDI1 10pl5.3 



AI215102 

NM_005565.2 
BC005247.1 



25.8 

25.8 
25.8 



CHN2- 7 

DCTN4 5q31-q32 

MY05C 15q21 

CLN2 lip 15 



AK026415.1 
NMJH6221.1 
NM_01S728.1 
BG231932 



25.8 
25.8 
25.8 
25 



SIAT9 
NR3C2 



2pll.2 
4q31.1 



TBX21 17q21.2 
BV1AGE34 lq42.13 
51454 

-141- 



NMJXB896.1 
NM 000901.1 



NMJH3351.1 
AA019893 



25 
25 



25 
25 



Above 

Above 
Below 



CDS2 20pl3 AL568982 25.8 Above 



Above 
Above 
Above 
Above 



Above 
Above 



Above 
Above 



1.4 

1.6 
1.7 

1.8 



3 
3.6 
1.8 
1.5 



1.8 
1.9 



3.3 
2 
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93 
94 



96 
97 



239519_at 
203005 at 



95 200665_s_at 



204004_at 
204576_s_at 



EST 

Lymphotoxin beta 
receptor (TNFR 
superfamily, 
member 3) 
Secreted protein, 
acidic, cysteine- 
rich (osteonectin) 
PRKC, apoptosis, 
WT1, regulator 
KIAA0643 
protein 



98 214255_at ATPase, Class V, 

type 10C 

99 216985_s_at Syntaxin 3A 

100 48106 at FLJ20489 



LTBR 



SPARC 



PAWR 



KIAA064 
3 

ATP10C 
STX3A 



10 

12pl3 



5q31.3- 
q32 

12q21 

16pl2.3 



15qll- 
ql3 

llql2.3 



FLJ20489 12pll.l 



AA927670 
NM_002342.1 


25 
24.3 


Above 
Above 


18.2 
10 


NM_003 118.1 


24.3 


Above 


9.8 


AI336206 


24.3 


Above 


3 


AA207013 


24.3 


Above 


2 


AB011138.1 


24.3 


Above 


9.9 


AJ002077.1 
H14241 


24.3 
24.3 


Above 
Above 


12 
2.8 



E2A 
above/ 

below Fold 
mean change 



2 201695_s_at 

3 204674_at 

4 205253_at 

5 212148_at 

6 212151_at 

7 212371__at 

8 219155_at 

9 225483_at 

10 227439 at 



NM_000270.1 88.0 
NM 006152.1 88.0 



Above 3.8 
Above 5.8 



Table 64. Top 100 chi-square probe sets selected for E2A-PBX1 

Chromo- Chi- 
XJ133 probe sonial GenBank square 
set Gene Description Symbol Location reference value 

"l 201579_at FAT tumor FAT 4q34-q35 NMJ)05245.1 88.0 Above 9.9 

suppressor 
homolog 1 
(Drosophila) 

nucleoside NP 14ql3.1 

phosphorylase 

lymphoid- LRMP 12pl2.3 

restricted 
membrane protein 
pre-B-cell 
leukemia 
transcription 
factor 1 

pre-B-cell PBX1 lq23 

leukemia 
tianscription 
factor 1, splice 
variant 

pre-B-cell PBX1 lq23 

leukemia 
transcription 
factor 1, splice 
variant 

DKFZp586C1019 



PBX1 lq23 NM_002585.1 88.0 Above 3549.2 



BF967998 88.0 Above 5283.5 



BF967998 88.0 Above 7472.2 



retinal 

degeneration B 
beta 

hypothetical 
protein 
MGC10485 
E2a-Pbxl- 
associated protein 



DKFZp58 1 
6C1019 

RDGBB 17q24.2 



MGC1048 llq25 
5 

EB-1 12 
-142- 



AL049397.1 88.0 Above 
NM 012417.1 88.0 Above 



AI971602 88.0 Above 



AW005572 88.0 Above 



2.5 
2.7 

7.7 

269.8 
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PCT/US03/08486 



11 227949_at 

12 230306 at 



13 231095_at 

14 203372_s_at 

15 20602S_s_at 

16 206181_at 

17 208788 at 



18 209760_at 

19 35974„at 

20 38340_at 

21 208644_at 

22 212789_at 

23 221113_s_at 

24 224022 x at 



25 231040_at 

26 232289_at 

27 235666_at 

28 203373_at 

29 210785 s at 



30 224733 at 



31 225235 at 



Q9H4T4 like 
hypothetical 
protein 
MGC10485 
retinal 

degeneration B 
beta 

STAT induced 
STAT inhibitor-2 
c-mer proto- 
oncogene tyrosine 
kinase 
signaling 
lymphocytic 
activation 
molecule 

homolog of yeast 
long chain 
polyunsaturated 
fatty acid 
elongation 
enzyme 2 
KIAA0922 
protein 
lymphoid- 
restricted 
membrane protein 
huntingtin 
interacting protein 
12 

ADP- 

ribosyltransferase 

(NAD+; poly 

(ADP-ribose) 

polymerase) 

KIAA0056 

protein 

wingless-type 

MMTV 

integration site 
family, member 
16 

wingless-type 
MMTV 

integration site 
family, member 
16 
EST 

FLJ14167 
EST 

STAT induced 
STAT inhibitor-2 
basement 
membrane- 
induced gene 
chemokine-like 
factor super 
family 3 
hypothetical 



H17739 
MGC1048 
5 



RDGBB 

SOCS2 
MERTK 

SLAM 
HELOl 



KIAA092 
2 

LRMP 



HIP12 
ADPRT 



KLAA005 
6 

WNT16 



WNT16 



FLJ14167 
FLJ20489 
SOCS2 

ICB-1 
CKLFSF3 



20ql3.32 
llq25 


AL357503 
AA514326 


88.0 


Above 
Above 


59.3 


17q24.2 


AW193811 


88.0 


Above 


25.6 


12q 


AB004903.1 


80.6 


Below 


23.4 


2ql4.1 


NMJ306343.1 


80.6 


Above 


23.7 


Iq22-q23 


NMJ)03037.1 


80.6 


Above 


6.3 


6p21.1- 
pl2.1 


AL136939.1 


80.6 


Above 


2.2 


4q31.23 


AL136932.1 


80.6 


Above 


2.9 


12pl2.3 


U10485 


80.6 


Above 


O.I 


12q24 


AB014555 


80.6 


Above 


3.8 


Iq41-q42 


M32721.1 


80.2 


Above 


3.0 




AI796581 


80-2 


Above 




7q31 


NM_01 6087.1 


80.2 


Above 


2547.6 


7q31 


AF169963.1 


80.2 


Above 


569.1 


9 
17 
10 
12q 


AW5 12988 
BF237871 
AA903473 
NM_003877.1 


80.2 
80.2 
80.2 
74.2 


Above 
Above 
Above 
Below 


16.4 
144.1 

OD4.0 

24.8 


lp35.3 


AB035482.1 


Til A 

74.2 


Below 


A 1 

4. 1 


. 16q23.1 


AL574900 


74.2 


Below 


41.7 


> 5q35.3 


AW007710 


74.2 


Above 


3.6 



-143- 



BNSDCCID. *WO_ 



_0J0B3 1 4UA*!J_3- 



WO 03/083140 



PCT/US03/08486 



protein 9 
MGC14859 

nidogen 2 NID2 

(osteonidogen) 

33 211913_s_at c-mer proto- MERTK 

oncogene tyrosine 
kinase 

34 219551_at uncharacterized BM040 

bone marrow 
protein BM040 

35 223693_s_at hypothetical FLJ10324 

protein FLJ1 0324 
moesin MSN 



32 204114 at 



36 200600_at 

37 213909_at 

38 221669_s_at 



39 235911 at 



40 243533_x_at 

41 202615_at 

42 204774_at 

43 218283 at 



44 209130_at 

45 228580_at 

46 202796_at 

47 218640_s_at 

48 235099 at 



49 2018S9_at 

50 202106_at 

51 20220S_s_at 

52 205173_x_at 



FLJ12280 
acyl-Coenzyme A 
dehydrogenase 
family, member 8 
ESTs, Weakly 
similar to PIHUB6 
salivary proline- 
rich protein 
precursor PRB1 
(large allele) 
ESTs 

DKFZp686D0521 

ecotropic viral 
integration site 2A 
synovial sarcoma 
translocation gene 
on chromosome 
18-like2 
synaptosomal- 
associated protein, 
23kDa 

serine protease 

HTRA3 

synaptopodin 

phafin 2 

ESTs, Weakly 
similar to 
PLLP_HUMAN 
Plasmolipin 
[H.sapiens] 
family 
sequence 
similarity 
member C 
golgi autoantigen, 
golgin subfamily 
a, 3 

ADP-ribosylation 
factor-like 7 
CD58 antigen, 
(lymphocyte 
function- 
associated antigen 



FLJ12280 
ACAD8 



DKFZp68 

6D0521 

EVI2A 

SS18L2 



14q21- 

q22 

2ql4.1 


NM_0073ol.l 
L08961.1 


HI 1 

/i.l 
72.8 


Above 
Above 


1ST 

JL J . 1 

37.7 


3q21.1 


NM_0 18456.1 


72.8 


Above 


3.0 


7p22 


AL136731.1 


72. S 


Above 


oo.o 


Xqll.2- 
ql2 

3 

llq25 


NM_002444.1 

AU147799 
BC001964.1 


72.5 

72.5 
72.5 


Below 

Above 
Above 


2.2 

12.5 
2.6 


3 


AI885815 


72.5 


Above 


36.6 


9 


H09663 
BF222895 


72.5 
68.6 


Above 
Below 


23.2 
6.2 



17qll.2 NM_014210.1 68.6 
3p21 NMJ)16305.1 68.6 



SNAP23 15ql4 BC003686.1 67.8 
HTRA3 



KIAA102 
9 

FLJ13187 



4pl6.1 
5q33.1 

8q21.3 

3 



AI828007 66.6 

NM_007286.1 66.5 

NM_024613.1 66.5 

AW080832 66.5 



with FAM3C 7q22.1- NM_014888.1 65.3 
q3Ll 

3', 

GOLGA3 12q24.33 NM_005895.1 65.3 

ARL7 2q37.2 BC001051.1 65.3 

CD58 lpl3 NM_001779.1 65.3 

-144- 



Below 
Above 

Below 

Above 

Above 

Above 
Above 

Above 

Above 

Above 
Above 



3.0 
1.6 

1.9 

3.8 

52.3 

3.1 
6.7 

4.6 

3.3 

3.2 
2.4 



BNSDOCID: <WO_ 



_03083140A2J_> 



WO 03/083140 







3) 




53 


211744_s_at 


CD58 antigen, CD5S 






(lymphocyte 








function- 








associated antigen 








3) 




54 


212552_at 


hippocalcin-like 1 


HPCAL1 


55 


213358_at 


KIAA0S02 


KIAA080 






protein 


2 


56 


222699 s at 


phafin 2 


FLJ13187 


57 


225618 at 


EST 




58 


238778_at 


DKFZp451L157 


DKFZp45 








1L157 


59 


239427_at 


ESTs 




60 


47069_at 


Rho GTPase 


ARHGAP 






activating protein 8 

Q 


61 


205769_at 


solute carrier 


SLC27A2 






family 27 (fatty 








acid transporter), 








member 2 




62 


210786_s_at 


Friend leukemia FLU 






virus integration 1 




63 


212985_at 


DKFZp434E033 


DKFZp43 








4E033 


64 


22744 l_s_at 


E2a-Pbxl- 


EB-1 






associated protein 




65 


234261_at 


DKFZp761M1012 


DKFZp76 






1 


1M10121 


66 


244565 at 


ESTs 




67 


202181_at 


KIAA0247 gene 


KIAA024 






product 


7 . 


68 


202207_at 


ADP-ribosylation 


ARL7 






factor-like 7 




69 


20757 l_x_at 


basement 


ICB-1 






membrane- 








induced gene 




70 


209558_s_at 


huntingtin 


HIP12 






interacting protein 








12 




71 


213005_s_at 


KIAA0172 


KIAA017 






protein 


2 


72 


236854_at 


cDNA 


DKFZp66 






DKFZp667F0617 


7F0617 


73 


226233_at 


mbulin-specific 


TBCE 






chaperone e 




74 


203435_s_at 


membrane 


MME 






metallo- 








endopeptidase 








(neutral 








endopeptidase, 








enkephalinase, 








CALLA, CD10) 




75 


202478_at 


GS3955 protein 


GS3955 


76 


202479 s at 


GS3955 protein 


GS3955 


77 


203999__at 


synaptotagmin I 


SYT1 


78 


212149_at 


KIAA0143 


KIAA014 






protein 


3 



PCT/US03/08486 



lpl3 


BC005930.1 


65.3 


Above 


2.5 


2p25.1 


BE617588 


65.3 


Below 


2.6 


lSpll.21 


AB018345.1 


65.3 


Above 


12.7 


8q21.3 


BF439250 


65.3 


Above 


3.5 


17 


AI769587 


65.3 


Below 


5.3 


10 


AI244661 


65.3 


Above 


23.5 


1 


AA131524 


65.3 


Above 


13.7 


22ql3.31 


AA533284 


65.3 


Above 


3.3 


15q21.2 


NM_003645.1 


65.1 


Above 


56.0 


llq24.1- 


M93255.1 


65.1 


Above 


2.2 


q24.3 










4 


BF1 15739 


65.1 


Above 


7.1 


12 


AW005572 


65.1 


Above 


1139.4 


12 


AL137313.1 


65.1 


Above 


960.8 


10 


AI685824 


65.1 


Above 


7.6 


14q24.1 


NM_0 14734.1 


63.7 


Above 


1.8 


2q37.2 


NM_005737.2 


63.7 


Above 


3.2 


lp35.3 


NM_004848.1 


63.7 


Below 


4.4 


12q24 


AB0133S4.1 


61.1 


Above 


23.8 


9p24.3 


D79994.1 


61.1 


Above 


8.3 


20 


AA743694 


61.1 


Above 


12.6 


lq42.3 


BG112197 


60.0 


Above 


2.6 


3q25.1- 


NM_0072S7.1 


59.9 


Below 


2.2 


q25.2 










2p25.1 


NM 021643.1 


59.3 


Above 


4.0 


2p25.1 


BC002637.1 


59.3 


Above 


3.3 


12cen- 


NM_005639.1 


59.3 


Above 


3.9 


q21 










8q24.12 


AA805651 


59.3 


Below 


13.5 



BNSDCGID. «-WO 03083 1 4UA2_I_!- 



WO (13/083140 



PCT/US03/084S6 



79 212873_at 

80 21S346_s_at 

81 224856_at 

82 20081 l_at 

83 201722 s at 



binding FKBP5 
CIRBP 



inducible 
binding 



84 22371 l_s_at 

85 233273_at 

86 201460 at 



87 20242 l_at 

88 217983_s_at 

89 218087_s_at 

90 218491_s_at 

91 201825_s_at 

92 202206_at 

93 218683_at 

94 226590_at 

95 227440_at 

96 229770_at 

97 40148 at 



98 212959_s_at 

99 203143_s_at 

100 209683 at 



minor HA-1 
histocompatibility 
antigen HA-1 
p53 regulated PA26 
PA26 nuclear 
protein 
FK506 
protein 5 
cold 
RNA 
protein 

UDP-N-acetyl- 
alpha-D- 

galactosamine:pol 
ypeptide N- 
acetylgalactosami 
nyltransferase 1 
(GalNAc-Tl) 
HSPC144 protein 
cDNA FLJ12010 
fis 

mitogen-activated 
protein kinase- 
activated protein 
kinase 2 

immunoglobulin 
superfamily, 
member 3 
ribonuclease 6 
precursor 
sorbin and SH3 
domain containing 
1 

HSPC144 protein 
CGI-49 protein 



19pl3.3 
6q21 



6p21.3- 

21.2 

19pl3.3 



BE349017 



59.3 Below 



NM 014454.1 59.3 Below 



AL122066.1 
NM 001280.1 



59.3 
59.1 



Below 
Below 



HSPC144 
FLJ12010 

MAPKAP 
K2 



IGSF3 



RNASE6P 
L 

SORBS 1 



ADP-ribosylation 
factor-like 7 
polypyrimidine 
tract binding 
protein 2 

cDNA clone 
EUROIMAGE 
1517766 
E2a-Pbxl- 
associated protein 
hypothetical 
protein FLJ31978 
amyloid beta (A4) 
precursor protein- 
binding, family B, 
member 2 (Fe65- 
like) 

MGC4170 protein 
KIAA0040 gene 
product 
hypothetical 
protein 

DKFZp566A1524 



HSPC144 
LOC5109 
7 

ARL7 



PTBP2 



EB-1 

FLJ31978 

APBB2 



llq25 
1 

lq32 



lpl3 

6q27 

10q23.3- 
q24.1 

llq25 
lq44 

2q37.2 

lp22.11- 
p21.3 



12 

12q24.33 
4pl4 



NM_014174.1 
AL572542 

NM_005737.2 

NM_021 190.1 

AA031404 

AW005572 



57.9 
57.8 

57.8 

57.8 

57.8 

57.8 



Above 
Above 

Above 

Above 

Above 



2.9 

4.7 

5.5 
5.8 



GALNT1 18ql2.1 NM_020474.2 59.1 Below 1.8 



AF182413.1 


59.1 


Above 


2.0 


AU 146834 


59.1 


Above 


30.6 


AI141802 


57.9 


Above 


2.1 


AB007935.1 


57.9 


Above 


4.4 


NM_003730.2 


57.9 


Below 


3.4 


NMJH5385.1 


57.9 


Above 


25.1 



1.4 
2.2 

3.9 

1.8 

3.1 



Above 1168.9 



MGC4170 
KIAA004 
0 

DKFZP56 
6A1524 



12q23.1 
lq24-25 

2p24.2 



AI041543 


57.8 


Above 


51.8 


U62325 


57.8 


Above 


6.2 


AK001821.1 


57.2 


Below 


3.0 


T79953 


56.3 


Above 


2.4 


AA243659 


56.3 


Below 


10.0 
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Table 65. Top 100 chi-square probe sets selected for Hyperdiploid >50 



U133 probe 
set 



Gene description Symbol 



HD 

Chromo- Chi- above/ 

somal square below Fold 

Location GenBankRef value mean change 

TTTTZ /~\ r\ r*\ a a a ~% "i a r\ at 1 n 



1 200600_at 

2 200737_at 

3 200980_s_at 

4 201136_at 

5 201807_at 

6 202214_s_at 

7 202557_at 

8 202593_s_at 

9 203680_at 

10 204194_at 

11 205324_s_at 

12 208598_s_at 

13 208861 s at 



14 211342 x_at 



Moesin 
(membrane- 
organizing 
extensio spike 
protein) 

Phosphoglycerate 
kinase 1 
Pyruvate 
dehydrogenase 
(lipoaniide) alpha 
1 

Proteolipid protein 
2 (colonic 
epithelium- 
enriched) 
Vacuolar protein 
sorting 26 (yeast) 
Cullin4B 
Stress 70 protein 
chaperone, 
microsome 
associated, 60 kD 
membrane 
interacting protein 
ofRGS16 
Protein kinase, 
cAMP-dependent, 
regulatory, type II, 
beta 

BTB and CNC 
homology 1, basic 
leucine zipper 
transcription 
factor 1 

FtsJ homolog 1 
(E. coli) 
Upstream 
regulatory element 
binding protein 1 
Alpha 

thalassemia/menta 
1 retardation 
syndrome X- 
linked(RAD54 
homolog, S. 
cerevisiae) 
trinucleotide 
repeat containing 
11(THR~ 
associated protein, 
230 kDa subunit) 



MSN Xqll.2- NM_002444.1 34.0 Above 1.9 
ql2 



PGK1 Xql3 NMJ)00291.1 34.0 Above 1.8 

PDHA1 Xp22.2- NM_000284.1 34.0 Above 1.7 
p22.1 



PLP2 Xpll.23 NM_002668.1 34.0 Above 3.3 



VPS26 

CUL4B 
STCH 



MIR16 



PRKAR2 
B 



10q21.1 NM_004896.1 34.0 Above 1.7 

Xq23 NM_003588.1 34.0 Above 1.9 
21qll AI718418 34.0 Above 2.0 



16pl2- NM_016641.1 34.0 Below 1.6 
pi 1.2 

7q22- NM_002736.1 34.0 Above 3.3 
q31.1 



BACH1 21q22.11 NMJXH 186.1 34.0 Above 1.8 



FTSJ1 
UREB1 

ATRX 



Xpll.23 NM_012280.1 34.0 Above 
Xpll.22 NMJ)05703.2 34.0 Above 



Xql3.1- U72937.2 
q21.1 



34.0 Above 



2.1 
1.6 

1.7 



TNRC11 Xql3 BC004354.1 34.0 Above 1.8 



-147- 



BW3DOCID. «-WO_ 



_03083I40A2_I_5 



WO 03/083140 

15 216071_x_at 

16 218573_at 

17 219485_s_at 

18 200655_s_at 

19 200738_s_at 

20 200944_s_at 



21 201092_at 

22 201100_s_at 

23 201688_s_at 

24 201899_s_at 

25 202325_s_at 

26 202829_s_at 

27 202854_at 

28 206846_s_at 

29 209370_s_at 

30 209565_at 

31 212846_at 

32 217356_s_at 

33 218163_at 

34 218386 x_at 



Trinucleotide 
repeat containing 
11 

APR-1 
prptein/melanoma 
-associated 
antigen 
proteasome 
(prosome, 
macropain) 26 S 
subunit, non- 
ATPase, 10 
Calmodulin 1 
(phosphorylase 
kinase, delta) 
Phosphoglycerate 
kinase 1 
High-mobility 
group (nonhistone 
chromosomal) 
protein 14; 
member of the 
HMG 14/17 
family 

Retinoblastoma 
binding protein 
7/RbAp46 
Ubiquitin specific 
protease 9 
Tumor protein 
D52 

Ubiquitin- 
conjugating 
enzyme E2A 
(RAD6 homolog) 
ATP synthase, H+ 
transporting, 
mitochondrial F0 
complex, subunit 
F6 

Synaptobrevin- 
like 1 

Hypoxanthine 
phosphoribosyltra 
nsferase 1 (Lesch- 
Nyhan syndrome) 
Histone 
deacetylase 6 
SH3-domain 
binding protein 2 
zinc finger protein 
183 

KIAA0179 
protein. 

Phosphoglycerate 
kinase 

MCT-1 protein 
Ubiquitin specific 
protease 16; de- 



TNRC11 
MAGEH1 



Xql3 
Xpll.22 



AF 132033 
NM 014061.1 



PCT/US03/08486 

34.0 Above 1.8 

34.0 Above 3.0 



PSMD10 Xq22.3 NMJM>2814.1 34.0 Above 2.4 



CALM1 

PGK1 
HMG14 



14q24- 
q31 

Xql3 

21q22.2 



NM 006888.1 30.1 Above 1.7 



NM_000291.1 
NM 004965.1 



30.1 Above 1.8 
30.1 Above 1.7 



RBBP7 Xp22.31 NM_002893.2 30.1 Above 1.6 



USP9X 
TPD52 
UBE2A 



Xpll.4 

8q21 

Xq24- 
q25 



NM_004652.2 
BE974098 
Ivnvl 003336.1 



30.1 Above 1.7 
30.1 Below 4.1 
30.1 Above 1.8 



ATP5J 21q21.1 NM_001685.1 30.1 Above 1.6 



SYBL1 


Xq28 


NM__005638.1 


30.1 


Above 


1.5 


HPRT1 


Xq26.1 


NMJ)00194.1 


30.1 


Above 


1.4 


HDAC6 


Xpll.23 


NM_006044.2 


30.1 


Above 


1.5 


SH3BP2 


4pl6.3 


AB000462.1 


30.1 


Above 


3.1 


ZNF183 


Xq25- 
q26 

21q22.3 


BC000832.1 


30.1 


Above 


2.2 


KIAA017 
9 

PGK1 


D80001.1 


30.1 


Above 


2.0 


Xql3 


S81916.1 


30.1 


Above 


1.8 


MCT-1 
USP16 


Xq22-24 
21q22.U 


NM 014060.1 
NM_006447.1 


30.1 
30.1 


Above 
Above 


1.8 
1.7 
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BNSDOCID: <WO 03083 140A2_I_> 



WO 03/083.140 



PCT/US03/08486 



35 218402_s_at 

36 21S495_at 

37 218499 at 



38 218757_s_at 

39 219038_at 

40 229967 at 



41 242794_at 

42 201132 at 



43 201312 s at 



44 201894 s at 



45 201923_at 

46 202371_at 

47 203126 at 



48 204219_s_at 

49 204835_at 

50 212071_s_at 

51 212419_at 

52 21271 8_at 

53 213502 x at 



ubiquitinates 
histone H2A; 
ubiquitous 
expression. 
Hermansky- 
Pudlak syndrome 
4 

Ubiquitously- 
expressed 
transcript 
Mst3 and SOK1- 
related 

kinase/STE20-like 
kinase; contains a 
Ser/Thr protein 
kinase domain 
Similar to yeast 
Upf3, variant B 
Hypothetical 
protein FLJ11565 
Chemokine-like 
factor super 
family 2. 
EST 

Heterogeneous 
nuclear 

ribonucleoprotein 
H2 (H 1 ) 
SH3 domain 
binding glutamic 
acid-rich protein 
like 

Decorin; 
glycoprotein that 
binds to type I 
collagen fibrils & 
plays a role in 
matrix assembly. 
Peroxiredoxin 4 
Hypothetical 
protein FLJ21174 
Inositol(myo)- 1 (or 
4)- 

monophosphatase 
2 

proteasome 
(prosome, 
macropain) 26S 
subunit, ATPase, 
1 

polymerase (DNA 
directed), alpha 
Spectrin, beta, 
non-erythrocytic 1 
EST 

Hypothetical 
protein MGC5370 
Homo sapiens 
cDNA FLJ32313 



HPS 4 
UXT 
MST4 



Xp 11.23- 
pll.22 

Xq26.1 



NM_022081.1 30.1 Below 
NM_0041S2.1 30.1 , Above 
NM 016542.1 30.1 Above 



UPF3B 

FLJ11565 

CKLFSF2 

HNRPH2 



SH3BGR 
L 



NM_023010.1 30.1 
NM_024657.1 30.1 
16q23.1 AA778552 30.1 



Xq25- 
q26 
Xq22.2 



4q31.1 AI569476 30.1 
Xq22 NM_019597.1 30.0 



Above 
Above 
Above 



Above 
Above 



Xql3.3 NM_003022.1 30.0 Above 



3.4 



1.5 



2.5 



23 
6.9 
4.3 



3.2 
2.0 



1.6 



DCN 


12ql3.2 


NM_001920.1 


30.0 


Above 


1.5 


PRDX4 
FLJ21174 


Xp22.13 
Xq22.1 


NM 006406.1 
NM_024863.1 


30.0 
30.0 


Above 
Above 


1.9 

3.6 


EV1PA2 


18pll.2 


NM_014214.1 


30.0 


Above 


4.1 


PSMC1 


19pl3.3 


NM_002802.1 


30.0 


Above 


1.3 


POLA 
SPTBN1 


Xp22.1- 

p21.3 

2p21 


NM_016937.1 
BE968833 


30.0 
30.0 


Above 
Below 


2.0 
1.7 


MGC5378 


10q22.3 
14q32.2 


AL049949.1 
BG1 10231 


30.0 
30.0 


Above 
Above 


13.1 
1.5 


FLJ32313 


22qll.23 


X03529 


30.0 


Below 


1.8 
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54 21405 l_at 

55 226039_at 

56 227279_at 

57 200642_at 

58 200799_at 

59 200943 at 



60 201018_at 

61 201311_s_at 

62 201443_s_at 

63 201472_at 

64 201689_s_at 

65 202602_s_at 

66 20304 l__s_at 

67 203102_s_ at 

68 203744 at 



fis, clone 
PROST2003232, 
weakly similar to 
BETA- 

GLUCURONIDA 
SE PRECURSOR 
(EC 3.2.1.31) 
Thymosin, beta 

Mannosyl (alpha- 

l 5 3)-glycoprotein 

beta-l,4-N- 

acetylglucosarniny 

ltransferase 

hypothetical 

protein 

MGC15737 

Superoxide 

dismutase 1, 

soluble 

Heat shock 70kD 
protein 1A 
High-mobility 
group (nonhistone 
chromosomal) 
protein 14; 
member of the 
HMG 14/17 
family 
Eukaryotic 
translation 
initiation factor 
1A 

SH3 domain 
binding glutamic 
acid-rich protein 
like 

ATPase, H+ 
transporting, 
lysosomal 
interacting protein 
2 

Von Hippel- 
Lindau binding 
protein 1 
Tumor protein 
D52 

HIV TAT specific 
factor 1 
Lysosomal- 
associated 
membrane protein 
2 

Mannosyl (alpha- 

1 ,6-)-glycoprotein 

beta-l,2-N- 

acetylglucosaminy 

ltransferase 

High-mobility 



TMSNB 



MGAT4A 



Xq21.33- 

q22.3 

2qll.2 



MGC1573 
7 

SOD1 



HSPA1A 
HMG14 



Xq22.1 

21q22.11 

6p21.3 
21q22.2 



BF677486 30.0 Above 

AW006441 30.0 Above 

AA847654 30.0 Above 

NM_000454.1 26.7 Above 

NM_005345.3 26.7 Above 

NM 004965.1 26.7 Above 



EIF1A Xp22.12 BE542684 26.7 Above 



SH3BGR 
L 



Xql3.3 AL515318 26.7 Above 



ATP6IP2 Xq21 



VBP1 

TPD52 

HTATSF1 

LAMP2 



Xq28 



8q21 

Xq26.1- 

q27.2 

Xq24 



AF248966.1 26.7 Above 

NM_003372.2 26.7 Above 

BE974098 26.7 Below 

NM_0 14500.1 26.7 Above 

J04183.1 26.7 Above 



MGAT2 14q21 NM_002408.2 26.7 Above 



3.1 
3.0 

5.6 

2.3 

2.7 
1.6 



HMG4 



Xq28 
-150- 



NM 005342.1 26.7 Above 



1.8 
1.6 
1.9 

1.7 

4.3 
1.5 
3.1 

1.6 
1.9 



BNSDOCID: <WO 03083 140A2_L> 



WO 03/083140 



PCT/US03/08486 



69 2055 IS s at 



70 208683_at 

71 209440_at 

72 210786_s_at 

73 212070_at 

74 213334_x_at 

75 215117_at 

76 218694_at 

77 222741_s_at 

>-I.Ci W^O.O 

79 225105_at 

80 225406_at 

81 225553_at 

82 226199_at 

83 226875_at 

84 232974_at 

85 46323 at 



group (nonhistone 
chromosomal) 
protein 4 
Cytidine 
monophosphate- 
N- 

acetylneuraminic 
acid hydroxylase 
(CMP-N- 
acetyhieiiraminate 
monooxygenase) 
Calpain 2, (m/II) 
large subunit; 
calcium- 
dependent Cys 
protease. 
Phosphoribosyl 
pyrophosphate 
synthetase 1; 
purine 

biosynthesis. 
Friend leukemia 
virus integration 1 
G protein-coupled 
receptor 56 
Three prime repair 
exonuclease 2 
Recombination 
activating gene 2; 
V(D)J 

recombinase. 
ALEX1 protein 

hypothetical 
protein FLJ11101 
SH3 - domain- 
kinase binding 
protein 1 

clone MGC:23936 
EvlAGE:3838595, 
mRNA, complete 
cds 

Twisted 
gastrulation 
Homo sapiens 
cDNA FLJ 12874 
fis 

Hypothetical 
protein 
MGC23937 
Hypothetical 
protein FLJ32122 
cDNA FLJ12417 
fis 

SCAN-1 Ca-H-- 
dependent ER 
nucleoside 
diphosphatase/apy 
rase 



CMAH 


6p22-p23 


NMJ303570.1 


26.7 


Below 


2.9 


CAPN2 


Iq41-q42 


M23254.1 


26.7 


Above 


2.2 


PRPS1 


Xq21- 


BC00 1605.1 


26.7 


Above 


1.4 




q27 










FLU 


llq24.1- 


M93255.1 


26.7 


Below 


2.5 




q24.3 








2.4 


GPR56 


16ql3 


AL554008 


26.7 


Above 


TREX2 


Xq28 


BE676218 


s> ST r~l 

26.7 


Above 


1.7 


RAG2 


llpl3 


AW058148 


26.7 


Below 


27.2 


ALEX1 


Xq21.33- 


NM_01 6608.1 


26.7 


Above 


2.8 




q22.2 








1.5 


FLJ11101 


6p21.1 


AI761426 


26.7 


Above 


SH3K3R1 


Xp22.1- 


AR23090.4.1 


26.7 


Above 


2.0 




p21.3 












12q23.3 


BF969397 


26.7 


Above 




TSG 


18pll.3 


AA195009 


26.7 


Above 


1.9 




14q22.2 


AL042817 


26.7 


Above 


1.6 


MGC2393 
7 


Xql3.1 


AL563795 




Above 


2 1 


FLJ32122 


Xq24 


AI742838 


26.7 


Above 


2.3 




Xp22.31 


AU148256 


26.7 


Above 


3.1 


SHAPY 


17q25.3 


AL120741 


26.7 


Above 


1.7 
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PCT/US03/08486 



86 203694 s_at 



87 20065 8_s_at 

88 201898__s_at 



89 203556_at 

90 203745_at 

91 203909_at 

92 204446_s_at 

93 205191_at 

94 206874_s_at 

95 208073_x_at 

96 209056_s_at 

97 210645_s_at 

98 215773_x_at 

99 215884_s_at 

100 217954 s at 



DEAD/H (Asp- 
Glu-Ala-Asp/His) 
box polypeptide 
16 

Prohibitin 

ubiquitin- 

conjugating 

enzyme E2A 

(RAD6 homolog) 

KIAA0854 

protein 

Holocytochrorae c 
synthase 
(cytochrome c 
heme-lyase) 
Solute carrier 
family 9 

(sodium/hydrogen 
exchanger), 
isoform 6 
Arachidonate 5- 
lipoxygenase 
Retinitis 

pigmentosa 2 (X- 
linked recessive) 
Ste20-related 
serine/threoniiie 
kinase 

Tetratricopeptide 
repeat domain 3 
CDC5 cell 
division cycle 5- 
like (S. pombe) 
Tetratricopeptide 
repeat domain 3 
ADP- 

ribosyltransferase 
(NAD+; 

poly(ADP-ribose) 
polymerase) -like 2 
Ubiquilin 2 

PHD finger 
protein 3 



DDX16 6p21.3 NM_0035S7.2 26.3 Above 1.3 



PHB 
UBE2A 



KIAA085 
4 

HCCS 



17q21 
Xq24- 
q25 



8q24.13 
Xp22.3 



AL560017 26.3 Above 
All 26625 26.3 Above 



NM_014943.1 26.3 Below 
AI801013 26.3 Above 



SLC9A6 Xq26.3 NMJ)06359.1 26.3 Above 



ALOX5 
RP2 

SLK 

TTC3 
CDC5L 

TTC3 
ADPRTL2 

UBQLN2 
PHF3 



10qll.2 

Xpll.4- 
pll.21 



NMJ)00698.1 26.3 Above 
NM 006915.1 26.3 Above 



10q25.1 AL138761 26.3 Above 



21q22.2 NM_003316.1 26.3 

6p21 AW268817 26.3 

21q22.2 D83077.1 26.3 

14qll.2- AJ236912.1 26.3 
ql2 



Xpll.23- 

pll.l 

6 



AK001029.1 26.3 
NM 015153.1 26.3 



Above 
Above 

Above 
Above 



Above 
Above 



2.0 
1.6 



1.6 

2.1 

1.9 

4.2 
2.1 

1.6 

1.9 
1.4 

2.2 
1.6 

1.9 
1.5 



Table 66. Top 100 chi-square probe sets selected for MLL 

Chromo- Chi- above/ 

somal square below Fold 

Location GenBankRef value mean change 



U133 probe 
set 



Description Symbol 



1 202603_at 

2 219463_at 



3 224772_at 

4 204069_at 



a disintegrin and 
metalloproteinase 
domain 10 
chromosome 20 
open reading 
frame 103 
neuron navigator 1 
Meisl, myeloid 



ADAM10 15q22 N51370 



44.6 



C20orfl03 20pl2 NM_0 1226 1.1 44.6 



NAV1 
MEIS1 



2pl4-pl3 
-152- 



AB032977.1 
NM 002398.1 



44.6 
44.4 



Above 
Above 



Below 
Above 



1.8 
24.7 



3.8 
73.7 



BNSDOCID: <WO 03083 140A2J_> 



WO 03/083140 



PCT/US03/08486 



5 218966_at 

6 226939_at 

7 204446_s_at 

8 206492_at 

9 212588_at 

10 215925_s_at 

11 211733_x_at 

12 212386_at 

13 218764_at 

14 218847_at 

15 222409_at 

16 242172_at 

17 201153_s_at 

18 210487_at 

19 2196S6_at 

20 22698 l_at 

21 203375_s_at 

22 221676_s_at 

23 201152_s_at 

24 221773_at 

25 201162_at 

26 201163_s_at 

27 203836_s_at 

28 203837 at 



ecotropic viral 
integration site 1 
homolog 
myosin 5C 
cDNA FLJ37247 
fis 

arachidonate 5- 
lipoxygenase 
fragile histidine 
triad gene 
protein tyrosine 
phosphatase, 
receptor type, C 
CD72 antigen 
(ligand for CD5) 
sterol carrier 
protein 2 
cDNA FLJ1191S 
fis 

Protein Kinase C 
eta isoform. 
IGF-II mRNA- 
binding protein 2 
coronin, actin 
binding protein, 
1C 
ESTs 

muscleblind-like 
(Drosophila) 
deoxynucleotidyltr 
ansferase, terminal 
gene for 
serine/threonine 
protein kinase 
Homo sapiens, 

IMAGE:4401491, 
mRNA 
tripeptidyl 
peptidase II 
coronin, actin 
binding protein, 
1C 

muscleblind-like 
(Drosophila) 
ELK3, ETS- 
domain protein 
(SRF accessory 
protein 2) 
insulin-like 
growth factor 
binding protein 7 
insulin-like 
growth factor 
binding protein 7 
mitogen-activated 
protein kinase 
kinase kinase 5 
rnitogen-activated 



MY05C 
FLJ37247 


15q21 


NMJ)1 8728.1 

a 

A12U23Z / 


44.4 
dd d 


Below 
Above 


4.5 
6.9 


ALOX5 


10qll.2 


NM_000698.1 


40.7 


Below 


66.8 


FHIT 


3pl4.2 


NM__002012.1 


40.7 


Below 


36.6 


PTPRC 


Iq31-q32 


AI809341 


40.7 


Above 


2.3 


CD72 


9plL2 


AF283777.2 


40.7 


Above 


3.0 


SCP2 


lp32 


BC005911.1 


40.1 


Above 


1.5 


FLJ11918 




AK021980.1 


40.1 


Below 


3.1 


PRKCH 


14q22.1- 

q22.3 

3q28 


NMJ)24064.1 


40.1 


Below 


7.6 


IMP-2 


NM_006548.1 


40.1 


Above 


23.2 


COROIC 


12q24.1 


AL1 62070.1 


40.1 


Above 


4.8 



MBNL 

DNTT 

HSA2508 
39 



3q25 

10q23- 

q24 

4pl6.2 



TPP2 
COROIC 

MBNL 
ELK3 

IGFBP7 
IGFBP7 
MAP3K5 
MAP3K5 



13q32- 
q33 

12q24.1 



3q25 
12q23 

4ql2 
4ql2 
6q22.33 
6q22.33 

-153- 



N50406 40.1 Above 

NM_021038.1 40.0 Above 

Ml 1722.1 40.0 Below 

NM_018401.1 40.0 Below 

AW002079 37.4 Below 

NM_003291.1 37.2 Above 

BC002342.1 37.2 Above 

NM_02 1 03 8 . 1 36.2 Above 

AW575374 36.2 Below 



NM_001553.1 36.0 Above 

NM_001553.1 36.0 Above 

D84476.1 36.0 Above 

NM 005923.2 36.0 Above 



33.6 
2.1 

2.9 

28.3 

1.0 

1.6 
3.5 

2.2 
8.2 

4.3 
4.0 
13.9 
4.2 



BNSUUCID! 030e3l40A2_l_^ 



WO 03/083140 



PCT/US03/08486 



29 213S91_s_at 

30 214895_s_at 

31 226415_at 

32 235S79_at 

33 2123S7_at 

34 2189S8_at 

35 228555 at 



36 202975 s at 



37 201105 at 



38 203434 s at 



39 212135_s_at 

40 212136_at 

41 230179_at 

42 218217_at 

43 225841_at 

44 226668 at 



protein kinase 
kinase kinase 5 
cDNA FLJ11918 
fis 

a disintegrin and 

metalioproteinase 

domain 10 

KIAA1576 

protein 

ESTs 

cDNA FLJ1 1918 
fis 

bladder cancer 
overexpressed 
protein 

EST; by BLAT 
calcium/calmoduli 
n-dependent 
Protine Kinase 
type II Delta chain 
(CAMK GROUP 
I) 

Rho-related BTB 
domain containing 
3 

lectin, galactoside- 

binding, soluble, 1 

(galectin 1) 

membrane 

metallo- 

endopeptidase 

(neutral 

endopeptidase, 

enkephalinase, 

CALLA, CD10) 

calcium 

transporting 

ATPase plasma 

membrane 

protein. 

calcium 

transporting 

ATPase plasma 

membrane 

protein. 

cDNA 

DKFZp547P158 
likely homolog of 
rat and mouse 
retinoid-inducible 
serine 

carboxypeptidase 
hypothetical 
protein FLJ30525 
Homo sapiens, 
similar to WD 
domain, G-beta 
repeat containing 
protein 



FLJ11918 




AI927067 


36.0 


Below 


3.2 


ADAM 10 


15q22 


AU135154 


36.0 


Above 


1.9 


KIAA157 


16q22.1 


AA1 56723 


3o.U 


Above 


AO 7 


6 

FLJ11918 




AI697540 
AK021980.1 


DO. u 

35.8 


Above 
Below 


D .O 

3.3 


BLOV1 


12ql5 


NM_018656.1 


35.8 


Below 


16.3 


CAMK2D 




AA029441 


35.8 


Above 


3.1 



RHOBTB 

3 

LGALS1 



MME 



5q21.2 
22ql3.1 



3q25.1- 
q25.2 



N21138 35.3 Above 

NM_002305.2 34.5 Above 
AI433463 34.1 Below 



ATP2B4 



ATP2B4 



DKFZp54 

7P158 

RISC 



17q23.2 



FLJ30525 lp!3.2 



AW5 17686 34.1 Below 

AW5 17686 34.1 Below 

N52572 34.1 Below 

NM_021626.1 32.8 Above 

BE502436 32.8 Above 

W80623 32.8 Above 



5.5 



14.5 



31.2 



2.4 

2.1 

6.4 
3.4 

1.8 
2.4 



-154- 



BNSDOCID: <WO. 



030831 40A2J_> 



WO 03/083140 



PCT/US03/0S486 



45 200989 at 



46 201151_s_at 

47 201563_at 

48 203753_at 

49 205668_at 

50 20647 l_s_at 

51 211302_s_at 

52 212012_at 

53 212063_at 

54 213241_at 

55 214651_s_at 

56 218140_x_at 

57 219988_s_at 

58 223046_at 

59 224150_s_at 

60 224933_s_at 

61 201078_at 

62 205550 s at 



63 212382_at 

64 22501 9_at 

65 225202_at 

66 22S855_at 

67 231899„at 

68 52164 at 



hypoxia-inducible 
factor 1, alpha 
subunit (basic 
helix-loop-helix 
transcription 
factor) 

muscleblind-like 
(Drosophila) 
sorbitol 
dehydrogenase 
transcription 
factor 4 
lymphocyte 
antigen 75 
plexinCl 
phosphodiesterase 
4B, cAMP- 
specific 
Melanoma 
associated gene 
CD44 antigen 
PLEXIN cl 
homeo box A9 
APMCF1 protein 
hypothetical 
protein FLJ10597 
egl nine homolog 
1 (C. elegans) 
plO-binding 
protein 
hypothetical 
protein 

DKFZp761F0118 

transmembrane 9 

superfamily 
9 

brain and 
reproductive 
organ-expressed 
(TNFRSF1A 
modulator) 
cDNA FLJ11918 
fis 

calcium/calmoduh 
n-dependent 
protein kinase 
(CaM kinase) II 
delta 

Rho-related BTB 
domain containing 
3 

nudix (nucleoside 
diphosphate 
linked moiety X)- 
type motif 7 
KIAA1726 
protein 

chromosome 1 1 
open reading 



HIF1A 


14q21- 


NMJ30 1530.1 


32.2 


Below 


1.8 




q24 










MBNL 


3q25 


NM_021038.1 


32.2 


Above 


2.6 


SORD 


15ql5.3 


L29008.1 


32.2 


Above 


1.8 


TCF4 


18q21.1 


NM_003 199.1 


32.2 


Below 


9 Q 


LY75 


2q24 


NM 002349.1 


32.2 


Above 


9 1 

Z. 1 


PLXNC1 


12q23.3 


Njv1_0Ud /Ol.i 


io 9 


Above 


7.7 


PDE4B 


lp31 


L20966.1 


19 9 


Below 


3.0 


D2S448 


2pter- 


AF200348.1 


TOO 

32.2 


Below 


9 A 




p25.1 








3.1 


CD44 


llpl3 


BE903880 


32.2 


Above 


PLXNC1 


AF035307.1 


32.2 


Above 


2.5 


HOXA9 


7pl5-pl4 

' ST 


U41813.1 


32.2 


Above 


28.5 


APMCF1 


3q22.2 


NM 021203. 1 


32.2 


Above 


1.4 


FLJ 10597 


lp34.1 


NM_0 18 150.1 


32.2 


Above 


1.9 


EGLN1 


lq4z.l 


MA/1 fk99fK1 1 




Below 


4.2 


BITE 


3q22-q23 


AF289495.1 


32.2 


Above 


2.1 


DKFZp76 


10q22.1 


AB037801.1 


32.2 


Above 


1 Q 


1F0118 












TM9SF2 


13q32.3 


NMJ304800.1 


32.0 


Above 


1.5 


BRE 


2p23.3 


NM_004899.1 


32.0 


Above 


2.0 


FLJ11918 




AK021980.1 


32.0 


Below 


9 7 
Z. / 


CAMK2D 


4q25 


AA777512 


32.0 


Above 


3.6 


RHOBTB 


5q21.2 


BE620739 


32.0 


Above 


5.5 



3 

NUDT7 



KIAA172 
6 

Cllor£24 



llq23.1 
llql3 



AI927964 

AB051513.1 
AA065185 



32.0 Above 



32.0 
32.0 



Above 
Above 



5.6 

33.0 
2.3 



-155- 



BIM5DOCID. *WO_ 



_OJ063I40A2_I_J 



WO 03/083140 



PCT/US03/08486 



69 212660_at 

70 213513_x_at 

71 222603_at 

72 23855S_at 

73 20239 l_at 

74 202604_x_at 

75 203435 s at 



76 204445_s_at 

77 209705 at 



78 214366_s_at 

79 215000_s_at 

80 220643_s_at 

81 226459 at 



82 23S712_at 

83 229686_at 

84 222620_s_at 
S5 224516 s at 



frame 24 
KIAA0239 
protein 
actin related 
protein 2/3 
complex, subimit 
2, 34kDa 
hypothetical 
protein FLJ23309 
ESTs 

brain abundant, 

membrane 

attached signal 

protein 1 

a disintegrin and 

metalloproteinase 

domain 10 

membrane 

metallo- 

endopeptidase 

(neutral 

endopeptidase, 

enkephalinase, 

CALLA, CD 10) 

arachidonate 5- 

lipoxygenase 

likely ortholog of 

mouse metal 

response element 

binding 

transcription 

factor 2 

arachidonate 5- 
lipoxygenase 
fasciculation and 
elongation protein 
zeta 2 (zygin II) 
Fas apoptotic 
inhibitory 
molecule 
Homo sapiens 
gastric cancer- 
related protein 
GCYS-20 (gcys- 
20) mRNA, 
complete cds; 
homology with 
mouse epidermal 
growth factor 
receptor pathway 
substrate 8 
ESTs 

cDNA FLJ35637 
fis 

hypothetical 
protein similar to 
mouse Dnajll 
hypothetical 
protein HSPC195 



KIAA023 
9 

ARPC2 



5q31.1 AI735639 31.7 Below 
2q36.1 BG034239 31.7 Above 



FLJ23309 9p24 AL136980 31.7 Above 



BASP1 



5pl5.1- 
pl4 



AI445833 
NM 006317.1 



31.7 
31.3 



MME 



ALOX5 
M96 



ALOX5 
FEZ2 

FAIM 



FLJ35637 
DNAJL1 

HSPC195 



3q25.1- NM 007287.1 
q25.2 



10qll.2 AI361S50 
lp22.1 AF073293.1 



31.3 
31.3 



10qll.2 AA995910 
2p21 AL1 17593.1 



31.3 
31.3 



Above 
Above 



ADAM 10 15q22 NMJ>01110.1 31.3 Above 



31.3 Below 



Below 
Below 



Below 
Above 



3q23 NM_018147.1 31.3 Above 
AW575754 31.3 Above 



1.7 
1.3 

3.6 

3.8 
2.1 

1.8 
54.8 



687.0 
1.5 



54.7 
1.7 

2.9 

1.6 



BF801735 


31.3 


Above 


2.7 


AI436587 


31.0 


Below 


1.5 


BF591419 


29.8 


Above 


2.4 


BC006428.1 


29.8 


Above 


2.7 



•156- 



BNSDOCID: <WO 03083 140A2J_> 



WO 03/083140 

86 203217_s_at 



87 204030 s_at 



88 209191_at 

89 213541_s_at 



90 213773_x_at 

91 219243_at 

92 219256_s_at 

93 223358_s_at 

94 224796_at 

95 203076_s_at 

96 212385_at 

97 216026_s_at 

98 217118_s_at 

99 219821_s_at 

100 201875 s at 



sialyltransferase 9 
(CMP- 

NeuAc:lactosylcer 
amide alpha-2,3- 
sialyltransferase; 
GM3 synthase) 
schwannomin 
interacting protein 
1 

tubulin beta- 5 
v-ets 

erythroblastosis 
virus E26 
oncogene like 
(avian) 

Williams Beuren 

syndrome 

chromosome 

region 20A 

immunity 

associated protein 

4 

hypothetical 
protein FLJ20356 
phosphodiesterase 
7A 

development and 
differentiation 
enhancing factor 1 
MAD, mothers 
against 

decapentaplegic 
homolog 2 
(Drosophila) 
cDNA FLJ11918 

polymerase (DNA 
directed), epsilon 
KIAA0930 
protein 
hypothetical 
protein FLJ20330 
hypothetical 
protein FLJ21047 



SIAT9 2pll.2 NM_003896.1 



PCT/US03/08486 
28.8 Below 2.1 



SCHIP1 3q25.32 NM 014575.1 28.8 Below 17.6 



TUBB-5 
ERG 



WBSCR2 
OA 



BC002654.1 
21q22.3 AI351043 



28.8 Above 6.4 
28.8 Below 2.8 



7qll.23 AW248552 28.8 Above 1.3 



HIMAP4 


7q35 


NM_01b32o.l 


Zo.o 


DC1UW 


13.4 


FLJ20356 


4pl6.1 


NM_0 18986.1 


2S.8 


Below 


2.6 


PDE7A 


8ql3 


AW269834 


28.8 


Above 


1.5 


DDEF1 


8q24.1~ 
q24.2 


W03103 


28.8 


Below 


1.8 


MADH2 


18q21.1 


U65019.1 


28.7 


Below 


2.0 


FLJ11918 




AK02 1980.1 


28.7 


Below 


3.2 


POLE 


12q24.3 


AL080203.1 


28.7 


Below 


3.0 


KIAA093 


22ql3.31 


AK025608.1 


28.7 


Above 


1.9 


0 

FLJ20330 
FLJ21047 


6pter- 
p22.1 
lq23.2 


NM_018988.1 
NM_024569.1 


28.7 
28.5 


Below 
Above 


5.5 
2.0 



Table 67. Top 100 chi-square probe sets selected for T-ALL a 

___ .... T-ALL 

Chromo- above/ 
somal Chi- below Fold 

iption Symbol Location GenBankRef square mean change 



U133 probe 

set Gene Descripti 



1 201137_s_at major HLA- 

histocompatibility DPB 1 
complex, class II, 
DP beta 1 

2 202113_s_at sorting nexin 2 SNX2 



6p21.3 NM.002121.1 100.0 Below 

5q23 AF043453.1 100.0 Below 
-157- 



4.2 



bNSUUUU! <!WU 



UJUBJI4UA2J_5- 



WO 03/083140 



PCT/US03/08486 



3 202 1 1 4_at sorting nexin 2 

4 203675 at nucleobindin 2 



5 204670_x_at 

6 205297_s_at 

7 205456 at 



8 206398_s_at 

9 20S306 x at 



10 208894 at 



11 209312 x at 



12 209619 at 



13 210116 at 



14 210982 s at 



15 211990 at 



16 211991 s at 



17 213539 at 



18 214049 x at 



major 

liistocompatibility 
complex, class II, 
DR beta 3 
CD79B antigen 
(immunoglobulin- 
associated beta) 
CD3E antigen, 
epsilon 

polypeptide (TiT3 
complex) 
CD 19 antigen 
major 

liistocompatibility 
complex, class II, 
DR beta 4 
major 

histocompatibility 
complex, class II, 
DR alpha 
major 

histocompatibility 
complex, class II, 
DR beta 1 
CD74 antigen 
(invariant 
polypeptide of 
major 

histocompatibility 
complex, class II 
antigen- 
associated) 
SH2 domain 
protein 1A, 
Duncan's disease 
(lymphoproliferati 
ve syndrome) 
major 

histocompatibility 
complex, class II, 
DR alpha 
major 

histocompatibility 
complex, class II, 
DP alpha 1 
major 

histocompatibility 
complex, class II, 
DP alpha 1 
CD3D antigen, 
delta polypeptide 
(T1T3 complex) 
CD7 antigen (p41) 



SNX2 
NUCB2 

HLA- 
DRB3 



CD79B 



CD3E 



CD19 
HLA- 
DRB4 



HLA- 
DRA 



HLA- 
DRB1 



CD74 



5q23 
llpl5.1- 
p!4 
6p21.3 



17q23 
llq23 



16pll.2 
6p2L3 



NM_003100.1 100.0 Below 4.6 
NM_005013.1 100.0 Above 3.6 

NM 002125.1 100.0 Below 13.4 



NM 000626.1 100.0 Below 23.3 



NM 000733.1 100.0 Above 20.7 



NMJ)01 770.1 100.0 Below 
NM 021983.2 100.0 Below 



6p21.3 M60334.1 100.0 Below 



5693.6 
8.3 



20.9 



6p21.3 U65585.1 100.0 Below 12.6 



5q32 



K01 144.1 100.0 Below 



SH2D1A 



HLA- 
DRA 



HLA- 
DPA1 



HLA- 
DPA1 



CD3D 
CD7 



Xq25- 
q26 



19 21455 l_s_at CD7 antigen (p41) CD7 



17q25.2- 
q25.3 
17q25.2- 
q25.3 

-15S- 



15.1 



AF072930.1 100.0 Above 150.7 



6p21.3 M60333.1 100.0 Below 23.4 



6p21.3 M27487.1 100.0 Below 19.6 



6p21.3 M27487.1 100.0 Below 24.5 



llq23 NM_000732.1 100.0 Above 35.7 



AI829961 100.0 Above 312.2 

NM 006137.2 100.0 Above 228.1 



BNSDOCID: <WO 030831 40A2_I_> 



WO 03/083140 

20 217147_s_at 

21 217478_s_at 

22 221969_at 

23 227646_at 

24 229487_at 

25 22983S_at 

26 232204_at 

27 203965_at 

28 20489 l_s_at 

29 205255_x_at 

30 207655_s_at 

31 20977 l_x_at 

32 211796_s_at 

33 213792_s_at 

34 215193_x_at 

35 216379_x_at 

36 219191j5_at 

37 219563_at 

38 219724_s_at 

39 221750_at 



40 226157_at 

41 226496_at 

42 266 s at 



T-cell receptor 
interacting 
molecule 
MHC, class Ha, 
HLA-DMA 
paired box gene 5 
(B-cell lineage 
specific activator 
protein) 

early B-cell factor 
cDNA FLJ39389 
fis 

cDNA FLJ39156 
fis 

early B-cell factor 
ubiquitin specific 
protease 20 
lymphocyte- 
specific protein 
tyrosine kinase 
transcription 
factor 7 (T-cell 
specific, HMG- 
box) 

B-cell linker 



TRIM 3ql3 AJ240085.1 



HLA- 
DMA 
PAX5 



EBF 

FLJ39389 

FLJ39156 

EBF 
USP20 

LCK 



X76775 
9pl3 BF510692 



5q34 
5 



BG435302 
W73890 

AI377271 



5q34 AF208502.1 

9q34. 12- NM_006676. 1 
q34.13 

lp34.3 NMJ)05356.1 



PCT/US03/08486 

100.0 Above 42.6 

100.0 Below 11.9 

100.0 Below 3922.0 

100.0 Below 85.0 

100.0 Below 7685.7 

100.0 Above 12.7 

100.0 Below 7129.1 



TCF7 5q31.1 NM_003202.1 



BLNK 



CD24 antigen CD24, 
(small cell lung 
carcinoma cluster 
4 antigen) 

T cell receptor TRB 
beta locus 

insulin receptor INSR 



10q23.2- NMJM3314.1 
q23.33 

6q21 AA761181 



7q34 



AF043 179.1 



HLA- 
DRB3 



major 

histocompatibility 
complex, class II, 
DR beta 3 
KIAA1919 
protein 

bridging integrator 
2 

hypothetical FLJ21276 
protein FLJ21276 
KIAA0748 gene 
product 
3-hydroxy-3- 
methylglutaryl- 
Coenzyme A 
synthase 1 
(soluble) 
cDNA FL J39 131 
fis 

hypothetical 
protein FLJ22611 
CD24 antigen 
(small cell lung 
carcinoma cluster 
4 antigen) 



KIAA191 
9 . 
BIN2 



KIAA074 
8 

HMGCS1 



19pl3.3 r AA485908 
pl3.2 

6p21.3 AJ297586.1 



6q22.1 AK000168.1 
12ql3 NM_016293.1 
14q32.2 NM_J)24633.1 
12ql2 NMJ)14796.1 
5pl4-pl3 BG035985 



91.3 


Above 


9.0 


91.3 


Above 


13.8 


91.3 


Above 


8.4 


91.3 


Below 


103.2 


91.3 


Below 


40.1 


91.3 


Above 


20.7 


91.3 


Below 


8.0 


91.3 


Below 


12.1 



91.3 Below 44.0 

91.3 Above 271.0 

91.3 Below 5.8 

91.3 Above 11.6 

91.3 Above 3.4 



FLJ39131 


3 


AI569747 


91.3 


Above 


4.4 


FLJ22611 


9pll.l 


BG291039 


91.3 


Below 


7.6 


CD24 


6q21 


L33930 


91.3 


Below 


69.7 



-159- 



BIYSDOCID. «LWO 03083l40A2J_-» 



WO 03/083140 



PCT/US03/08486 



A 1 

43 


393 lo_at 


l -cell 

leukemia/lympho 


TPT 1 A 


14q32.1 


X82240 


91.3 


Below 


367.4 


A A 

44 


zU4zl4_s_at 


ma 1A 

K/idj/, riit/iiiuei 
RAS oncogene 


R AR3? 


6q24.3 


NMJ)06834.1 


90.6 


Above 


127.9 


43 


/U4 / / /_S_ai 


family 
nidi, l -ecu 
differentiation 


MAL 


2cen-ql3 


NM_002371.2 


90.6 


Above 


96.8 


46 


204o90_s_at 


protein 
lympnocyre- 
specific protein 


T PTC 


lp34.3 


U07236.1 


90.6 


Above 


18.6 






tyrosine kinase 








90.6 


Below 


11.4 


47 


205049_s_at 


CD79A antigen 
(immunoglobulin- 


CD79A 


19ql3.2 


NM_001783.1 






associated alpha) 










Above 


352.0 


48 


205254_x_at 


transcription 
factor 7 (T-cell 
speciiic, rTivivjr- 


TCF7 


5q31.1 


AW027359 


90.6 


49 


205504_at 


box) 
Bruton 

agammaglobuline 
mia tyrosine 


X_> 1 XV. 


Xq21.33- 
q2Z 


NM_000061.1 


90.6 


Below 


6.6 






kinase 










Above 


15.9 


50 


210915_x_at 


T cell receptor 


TRB 


7q34 


M15564.1 


90.6 






□eta locus 










Above 


1963.5 


51 


211211_x_at 


SH2 domain 
protein 1A, 
Duncan's disease 
(lympnopi onieran 


SH2D1A 


AqZ!>- 










ve syndrome) 










Above 


7411.2 


52 


21383 0_at 


l ceil receptor 




14qll.2 


AW007751 


90.6 






delta locus 












253.7 


53 


216191_s_at 


T cell receptor 


TRD 


14qll.2 


X72501.1 


90.6 


Above 






delta locus 










Above 


151.9 


54 


217143_s_at 


1 cell receptor 


TDf\ 
1 tS±J 


14nll 2 


X06557 1 


90.6 






delta locus 










Above 


11.6 


55 


219528_s_at 


B-cell 

CLL/lymphoma 
1 IB (zinc finger 


BCL11B 


14q32.31 
-q32.32 


NM_022898.1 


90.6 


56 


220418_at 


protein) 
ubiquitin 
associated and 
ori-j Qomaiii 


UBASH3 

A • 


21q22.3 


NM_018961.1 


90.6 


Above 


759.3 






containing, A 










Above 


11.7 


57 


222895_s_at 


T5 /-./all 

r>-cen 

CLL/lymphoma 
11B (zinc finger 


Rf^r 1 1 r 


14a32 31 
-qoZ.jz 


AA918317 


90.6 


58 


223553_s_at 


protein) 
hypothetical 


FLJ22570 


5q35.3 


BC004564.1 


90.6 


Below 


6.1 






protein FLJ22570 










Below 


3.6 


59 


225090_at 


HRD1 protein 


HRD1 


llql2 


AA844682 


90.6 


60 


226459_at 


Homo sapiens 
gastric cancer- 
related protein 
GCYS-20 (gcys- 
20) mRNA, 






AW575754 


90.6 


Below 


10.7 






complete cds 










Below 


4.7 


61 


228314_at 


cDNA FLJ37485 
fis 


FLJ37485 




BE877357 


90.6 



-160- 



BNISDOCIO: <WO 030831 40A2_I_> 



WO 03/083140 



PCT/US03/08486 



62 201384 s at 



63 202540_s_at 

64 20319S_at 

65 203932__at 

66 20461 3_at 

67 205267_at 

68 208650_s_at 

69 20865 l_x_at 

70 209995_s_at 

71 210038_at 

72 211126_s_at 

73 220068_at 

74 226245_at 

75 202615_at 

76 22486 l_at 

77 201194_at 

78 201349 at 



79 202539 s at 



membrane 
component, 
chromosome 17, 
surface marker 2 
(ovarian 

carcinoma antigen 

CA125) 

3-hydroxy-3- 

methylglutaryl- 

Coenzyme A 

reductase 

cyclin-dependent 

kinase 9 (CDC2- 

related kinase) 

major 

lustocompatibility 
complex, class II, 
DM beta 
phospholipase C, 
gamma 2 

(phosphatidylinosi 
tol-specific) 
POU domain, 
class 2, 

associating factor 
1 

CD24 antigen 
(small cell lung 
carcinoma cluster 
4 antigen) 
CD24 antigen 
(small cell lung 
carcinoma cluster 
4 antigen) 
T-cell 

leukernia/lyinpliG 
ma 1A 

protein kinase C, 
theta 

cysteine and 
glycine-rich 
protein 2 

pre-B lymphocyte 

gene 3 

cDNA 

DKFZp451C132 
cDNA 

DKFZp686D0521 
cDNA FLJ3 1 057 
fis 

selenoprotein W, 
1 

solute carrier 
family 9 

(sodium/hydrogen 
exchanger), 
isoform 3 
regulatory factor 1 
3-hydroxy-3- 



M17S2 17q21.1 NM_005899.1 83.8 Above 3.3 



HMGCR 



CDK9 



HLA- 
DMB 



5ql3.3- 
ql4 



9q34.1 
6p21.3 



NM 000859.1 83.8 Above 



4.4 



NM 001261.1 83.8 
NM 002118.1 83.8 



Below 4.8 
Below 7.9 



PLCG2 


16q24.1 


NM_002661.1 


83.8 


Below 


3.9 


POU2AF1 


llq23.1 


NM_006235.1 


83.8 


Below 


110 
1 l.Z 






BG327863 


83.8 


Below 


74.7 


CD24 


6q21 


M58664.1 


83.8 


Below 


52.7 


TCL1A 


14q32.1 


BC003574.1 


83.8 


Below 


20166. 
2 


PRKCQ 


10pl5 


AL137145 


83.8 


Above 


12.7 


CSRP2 


12q21.1 


U46006.1 


83.8 


Below 


18.0 


VPREB3 


22qll.23 


NM_013378.1 


83.8 


Below 


6559.8 


DKFZp45 




U55984 


83.8 


Above 


8.7 


1C132 










3.1 


DKFZp68 




BF222895 


82.2 


Above 


6D0521 












FLJ31057 




BF477658 


82.2 


Above 


3.5 


SEPW1 


19ql3.3 


NM_003009.1 


82.0 


Above 


3.8 


SLC9A3R 
1 


17q25.2 


NM_004252.1 


82.0 


Above 


2.9 


HMGCR 


5ql3.3- 


AL5 18627 


82.0 


Above 


3.5 



-161- 



BN3DOCID. «.WO_ 



_030fi3IU0A2_!_3 



WO 03/083140 



PCT/US03/08486 









methylglutaryl- 










Coenzyme A 










reductase 




80 


203588_ 


s_at 


transcription 


TFDP2 








factor Dp-2 (E2F 










dimerization 










partner 2) 


PTPN7 


81 


204852_ 


s__at 


protein tyrosine 








UIlUbUllaLaoC, HUH 










receptor type 7 




82 


207434_ 


s_at 


FXYD domain 


FXYD2 








containing ion 










transport regulator 
2 




83 


20S872_ 


s_at 


DNA segment, 


D5S346 








single copy probe 










LNS-CAI/LNS- 










CAII 




84 


209200__ 


. at 


MADS box 


MEF2C 








transcription 










enhancer factor 2, 










polypeptide C 










(myocyte 










enhancer factor 










2C) 




S5 


212795_ 


at 


KIAA1033 


KIAA103 








protein 


3 


86 


212827_ 


at 


immunoglobulin 


IGHM 








heavy constant mu 




87 


213193, 


x_at 


T cell receptor 


TRB 








beta locus 




88 


221002_ 


_s_at 


tetraspanin similar 


DC- 








to 1JV14 or y 




89 


225314 


.at 


hypothetical 


MGC4541 








protein 


6 








MGC45416 




90 


227432_ 


s_at 


insulin receptor 


INSR 


91 


203332_ 


s_at 


inositol 


INPP5D 








polyphosphate-5 - 










phosphatase, 










145kDa 




92 


203589_ 


_s_at 


transcription 


TFDP2 








factor Dp-2 (E2F 










Qimenzaiion 










partner 2) 




93 


205674 


_x_at 


FXYD domain 


FXYD2 








containing ion 










transport regulator 
2 




94 


209881. 


_s_at 


Linker for 


LAT 








activation of T 










cells 




95 


211005. 


at 


Linker for 


LAT 








activation of T 










cells 




96 


211075 


__s_at 


CD47 


CD47 


97 


211210 


_x_at 


SH2 domain 


SH2D1A 








protein 1A, 





3q23 


BG034328 


82.0 


Above 


17.5 


lq32.1 


NMJ)02832.1 


82.0 


Above 


9.5 


llq23 


NMJ)2 1603.1 


82.0 


Above 


14.6 


5q22-q23 


AA814140 


82.0 


Below 


2.6 


5ql4 


N22468 


82.0 


Below 


7.5 


12q24.11 


AL137753.1 


82.0 


Below 


2.4 


14q32.33 


X17115.1 


82.0 


Below 


1 1 1 
13.1 


7q34 


AL559122 


82.0 


Above 


10.9 


10q23.2 


NM_030927.1 


82.0 


Below 


2.1 


4pl2 


BG291649 


82.0 


Above 


5.5 


19pl3.3- 

pl3.2 

2q36-q37 


AI215106 
NM_005541.1 


82.0 
81.5 


Below 
Below 


6.0 
2.2 


3q23 


NM_006286.1 


81.5 


Above 


35.1 


llq23 


NM_00 1680.2 


81.5 


Above 


12.2 


16ql3 


a T7AT /CTAAC 1 


Ol.J 


Above 


1 S9^ 4 


16ql3 


AF036906.1 


81.5 


Above 


67.8 


Xq25- 


Z25521.1 
AF100539.1 


81.5 
81.5 


Above 
Above 


2.1 
300.2 



q26 

162- 



BNSDOCID: <WO 030831 40A2J_> 



WO 03/083140 



PCT/US03/08486 



98 213601 at 



Duncan's disease 
(lymphoproliferati 
ve syndrome) 
slit homolog 1 
(Drosophila) 

99 213S57_s_at CD47 antigen 

(Rh-related 
antigen, integrin- 
associated signal 
transducer) 

100 214924__s_at KIAA1042 

protein 



SLIT1 
CD47 



10q23.3- AB011537.2 81.5 
q24 

3ql3.1- BG230614 81.5 
ql3.2 



Above 1752.1 
Above 2.2 



KIAA104 3p25.3- 
2 p24.1 



AK000754.1 



81.5 Below 2.3 



Table 68. Top 100 chi-square probe sets selected for TEL-AML1 

■~~ ~~ TEL- 

AML 
above/ 

below Fold 
mean change 



U133 probe 



Gene 

Description 



Chromo- Chi- 
somal square 
Symbol Location GenBankRef value 



1 224722 at 



2 227377_at 

3 237206_at 

4 241505_at 

5 203184 at 



205109 s at 



7 210650_s_at 

8 213558_at 

9 220451 s at 



10 224720 at 



KIAA1323 



FLJ12722 

EST 

EST 

Fibrillin 2 

(congenital 

contractural 

arachnodactyly) 

Rho guanine 

nucleotide 

exchange factor 

(GEF)4 

Piccolo 

Piccolo 

Livin IAP 

(inhibitor of 

apoptosis) 

KIAA1323 



lSqll.l W80418 75 

KIAA132 
3 

FLJ12722 17q21.32 AK022784.1 75 

17pl2 AI452798 75 

BF5 13468 75 

FBN2 5q23.2 NM_001999.2 69.1 



Above 



7.6 



Above 2446.3 
Above 23.7 



Above 
Above 



2q22 



NM 015320.1 69.1 



ARHGEF 
4 



PCLO 7q21.ll BC001304.1 69~A 
PCLO 7q21.11 AB011131.1 69.1 
BIRC7 20ql3.3 NM_022161.1 69.1 



KIAA132 
3 



18qll.l W80418 



69.1 



Above 



13.4 
14.4 



Above 148.1 



Above 101:2 
Above 77.5 
Above 25.4 



4.3 



11 


235694_at 


EV1AGE:4661943 




20ql3.33 


N49233 


69.1 


Above 


9.3 






Unknown EST 












3.7 


12 ' 


202808_at 


Hypothetical 


FLJ20154 
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Biologic insights from the new class defining genes 

Interestingly, the overall quantitative pattern of expression of discriminating 
genes varied significantly between leukemia subtypes (Table 69). Within the B-cell 
lineage leukemia subtypes, E2A-PBX1, TEL-AML1, BCR-ABL, and Hyperdiploid 
>50 chromosomes were characterized primarily by genes that were overexpressed, 
where as almost 40% of the discriminating genes that characterized MLL fusion gene 
expressing leukemias were underexpressed. More remarkably, the discriminating 
genes for the leukemia subtypes defined by chimeric transcription factors were 
markedly overexpressed, with an average fold increase of 1 12 and 48 for E2A-PBX1 
and TEL-AML1, respectively. By contrast, the discriminating genes for BCR-ABL 
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and MLL fusion gene expressing leukemias showed an average fold increases of only 
6.8. and S.6, respectively, whereas the discriminating genes for hyperdiploid >50 
chromosomes had an average fold-increase of only 2.6 fold. These data suggest that 
the quantitative global changes in a cell's expression profile vary markedly depending 
on the genetic lesion(s) that underlie the initiation of the leukemic process. 



Table 69. Summary of fold change by diagnostic 
subgroup (by gene) 





Mean fold 




Subgroup 


change 


Range 


BCR-ABL 


6.S 


1.1-90.5 


E2A-PBX1 


112.0 


1.6-5435 


Hyperdiploid >50 


2.6 


1.3-27.2 


MLL rearrangement 


8.6 


1.0-75 


T-ALL 


387 


2.1 -7685 


TEL-AML1 


48.3 


1.5-2446 



Tables 70-74 show genes whose expression is limited to a single B-cell 
lineage class, and therefore function not only as class discriminators in the decision 
tree format, but are also class discriminators in a parallel format in which a class is 
distinguished against all others. Thus, these genes have the potential of serving as 
unique class specific diagnostic or therapeutic targets. In addition, these genes may 
provide unique insights into the underlying biology of the different leukemia 
subtypes. For example, BCR-ABL expressing ALLs are characterized by the over 
expression of Dynactin 4, which encodes a RING finger containing protein that is part 
of the 20S dynactin multisubunit complex involved in movement, intracellular 
transport and division through its interaction with the cytoplasmic microtubule-based 
motor dynein; PSTPIP2, which encodes a proline/serine/threonine phosphatase- 
interacting protein that is also involved in controlling the organization of the 
cytoskeleton, and is tyrosine phosphorylated following activation of receptor tyrosine 
kinases (Karki et al. (2000) J. Biol Chem. 275:4834-4839); and several novel ESTs. 
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Table 70: Genes highly Correlated with BCR-ABL 


GenBank Reference 


Gene Description 


AK002064 






Dvnactin 4 

-1 Y iitiv Will ■ 


NM_024600 


FLJ20898 j 


NM_024430 


Pro-Ser-Thr phsphatase interac. protein 2 


AV648669 


FLJ39877 



E2A-PBX1 expressing leukemias are characterized by the expression of 
PBX1, the receptor tyrosine kinase gene C-MERTK, and the FAT tumor suppressor, 
which encodes a member of the cadherin repeat domain containing family of 

5 transmembrane proteins (see Table 64). Among the discriminating genes were two 
genes, EB-1 and Wntl6 that had previously been shown to be over expressed in this 
leukemia subtype (Wu et al. (1998) J. Biol. Chem. 273:30487-30496; and Fu et al. 
(1999) Oncogene 18:4920-4929). In addition, the retinal degeneration B beta gene 
(McWhirter et al. (1999) Proc. Natl. Acad. Sci. USA. 96:11464-11469), and a 

10 number of novel ESTs were identified as being uniquely over expressed in this 

leukemia subtype, whereas the SOCS2 negative regulators of cytokine signaling was 
found to be under expressed (Fullwood and Hsuan (1999) J Biol. Chem. 274:31553- 
3.1558), 26 



Table 71 : Genes highly Correlated with E2A-PBX1 


GenBank Reference 


Gene Description 


NM_012417 


retinal degeneration B beta 


AI971602 


MGC10485 


AW005572 


EB-1 


AL357503 


Q9H4T4 like 


NM_016087 


Wntl6 



Hyperdiploid leukemias with >50 chromosomes were characterized by the 
over expression of MST4, which encodes a novel serine/threonine kinase (Horvat and 
Medrano (2001) Genomics 72:209-212); SH3BP2, which encodes a SH3-domain 
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containing binding protein (Lin et al. (2001) Oncogene 20:6559-6569) histone 
deacetylase 6, which encodes a protein involved in transcriptional repression; the 
retinoblastoma binding protein 7 gene, which encodes a protein found in many 
functional histone deacetylase complexes (Bell et al. (1997) Genomics 44:163-170), 
5 and TNRC1 1 a trinucleotide repeat containing gene that is also known as HOPA or 
TRAP230 and is part of the thyroid hormone receptor-associated protein (TRAP) 
complex (Huang et al. (1991) Nature 350:160-162; and Ito et al. (1999) Mol Cell. 
3:361-370. 



Table 72: Genes highly Correlated with Hyperdiploid >50 


GenBank Reference 


Gene Description 


NM_002893 


Retinoblastoma binding protein 7 


AB000462 


SH3-domain binding protein 2 


NMJ)06044 


Histone deacetylase 6 


BC004354 


trinucleotide repeat containing 1 1 


NMJ) 16542 


Mst3 and SOK1 -related kinase 



10 

Cases with MLL gene rearrangements were characterized by the over 
expression of HOXA9 and Meisl (see Table 66). Included in the up-regulated genes 
was a novel transcript from chromosome 20 that was over expressed almost 25 fold. 
This transcript is predicted to encode a protein of 2S0 amino acids that shows a low 

1 5 level of homology to a lysosome-associated membrane glycoprotein (LAMP). Also 
specifically over expressed in this leukemia subtype is a gene encoding an insulin 
growth factor (IGF) II RNA binding protein, that has been shown to repress the 
translation of the IGF-H growth factor (Armstrong et al. (2002). Nat. Genet. 30:41- 
47). Among the down regulated genes was neuron navigator 1 (Nielsen et al (1999) 

20 Mol Cell Biol. 19:1262-1270), which encodes an 1874 amino acid protein and is 
involved in direction guidance of migratory cells, and a member of the TCF/LEF 
family of transcription factors, TCF-4. TCF-4 functions downstream of p-catenin in 
the Wnt-mediated signaling cascade and has been shown to be essential for the 
maintenance of intestinal crypt stem cells (Maes et al. (2002) Genomics 80:21-30). 

25 
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Table 73: Genes highly Correlated with MLL 


GenBank Reference 


Gene Description 


NM_012261 


C20orfl03 


AI202327 


FLJ37247 


NM_00654b 


Tf"lT? TT inT?XT A -hinHincr rvrnfpin 0 
JLvj-T -JL1 llliviN/A.""DillU.lllg jJivjidii 


NM_018401 


gene for serine/tlireonin protein kinase 


NM_01 8728 


myosin 5C ! 


AB032977 


neuron navigator 1 



Genes that were discriminators of TEL-AML1 leukemias included a gene 
localized to chromosome lSqll.l that encodes a 795 amino acid protein that has 8 
ankyrin repeat domains and a C-terminal RING finger domain. This combination of 
5 domains is identified in only a limited number of mammalian proteins, most notably 
BARD1, a regulator of the BRCA1 tumor suppressor (Korinek et al (1998) Nat 
Genet. 19:379-383). Other genes overexpressed in the subtype include desmocollin 
(Irminger-Finger and Leung (2002) Int. J. Biochem. Cell Biol 34:582-587), FLJ12722 
a novel protein of unknown function, and a member of the LAP family of apoptosis 
10 inhibitors, BIRC7, which is overexpressed 25 fold (Whittock et al (2000) Biochem 
Biophys Res Commun. 276:454-460). 



Table 74: Genes highly Correlated with TEL-AML1 


GenBank Reference 


Gene Description 


W80418 


KIAA1323 


AK022784 


FLJ12722 


NM_022161 


BIRC7 


AI452798 


FLJ39434 


AI797281 


Desmocollin 3 
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Expression profiling accurately identifies the prognostic subtypes of ALL 

To assess the accuracy of identifying prognostically important ALL genetic 
subtypes by expression profiling, the class discriminating genes identified using a chi- 
squared metric were used in an ANN-based supervised learning algorithm. Class 
5 assignment utilized the decision tree differential diagnostic fomiat described 
elsewhere herein, and required that the node value for assignment exceeded a 
statistically defined confidence level. Using this approach resulted in exceptionally 
accurate class prediction in a randomly selected training set that consisted of three- 
fourths of the total cases (100 cases). When this classification model was then applied 

10 to a blinded test set consisting of the remaining 32 samples, an overall accuracy of 
97% was achieved for class assignment. To control for over-fitting of the data, 10 
additional rounds of this analysis were performined in which for each round new 
training and test sets were developed, genes reselected using the new training set, and 
then their performance assessed on the new test set. This resulted in an average 

1 5 accuracy of class assignment in the blinded test sets of 97.2%, with a range from 

93. S% to 100%. Although the number of genes required for optimal class assignment 
varied between classes, the best overall diagnostic accuracy was achieved using the 
top 50 genes per class. A similar level of accuracy was achieved using a variety of 
other supervised learning algorithms, including k-NN and SVM. 

20 Interestingly, of the rare misclassification errors, two were cases of BCR-ABL 

expressing ALL that by gene expression analysis was classified as hyperdiploid >50 
chromosomes. The karyotype of these cases showed the presence of both the 
Philadelphia chromosome and a hyperdiploid karyotype consisting of >50 
chromosomes - including trisomy of chromosomes X and 21 (data not shown). The 

25 expression profile thus correctly identified the presence of the hyperdiploid >50 

chromosomes class; however, since each case is assigned to only a single class, the 
algorithm failed to correctly identify the presence of BCR-ABL. Nevertheless, the 
data presented demonstrates the exceptional accuracy of this single platform for the 
diagnosis of the prognostically important subtypes of ALL. 

30 



-172- 



BNSDOCID: <WO 03083 140A2J_> 



WO 03/083140 PCT/US03/08486 

Overview of Experimental Procedure 

A. Gene expression profiling 

The preparation of mononuclear cell suspensions from diagnostic bone 
marrow aspirates, extraction of total RNA, and preparation of hybridization solutions 
was performed as described for Example 1. Individual hybridization solutions from 
our previous study had been stored at -S0°C since initial hybridization (approximately 
1 year). These solutions were thawed and hybridized to Affymetrix® HG-U133A and 
HG-U133B oligonucleotide microarrays (Affymetrix Inc., Santa Clara, CA) according 
to Affymetrix protocols. In two cases where the original hybridization solutions were 
no longer available, replicate viably frozen mononuclear cell preparations from the 
diagnostic bone marrow aspirate were obtained, RNA isolated, cDNA and cRNA 
synthesized, labeled, fragmented and hybridized as described for Example 1. 

After sample hybridization, arrays were then stained with phycoerythrin- 
conjugated streptavidin (Molecular Probes, Eugene, OR). Antibody amplification was 
performed with biotinylated anti-streptavidin (Vector Laboratories, Burlingame, CA), 
followed by staining with phycoerythrin-conjugated streptavidin (Molecular Probes). 
Arrays were scanned using a laser confocal scanner (Agilent, Palo Alto, CA) and then 
analyzed with Affymetrix® Microarray suite 5.0 (MAS 5.0). Detection values 
(present, marginal or absent) were determined by default parameters, and signal 
values were scaled by global methods to a target value of 500. Microarray scan 
images were visually inspected for apparent defects, and Affymetrix internal controls 
were utilized to monitor the success of hybridization, washing, and staining 
procedures. Minimal quality control parameters for inclusion in the study included 
greater than 10% present calls and a GAPDH 375' ratio of < 3. The arrays included in 
this study had an average % present call of 35.9% for the A chip and 21 .0% for the B 
chip (combined average of 28.5%). 

B. Statistical Analysis 

The dataset was separated into a train set (100) and test set (32). The 
identification of subtype discriminating genes was performed using the training set. 
Moreover, both gene discovery and subsequent class predictions were performed 
using a differential diagnosis decision tree format. In this format, classification was 
performed in a sequential order starting with T-ALL and proceeding in order E2A- 
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PBX1, TEL-AML1, BCR-ABL, MLL rearrangement, and Hyperdiploid >50 
cliromosomes. Unassigned cases were classified as other. Samples classified into the 
class under diagnosis were removed prior to proceeding to the next level in the 
decision tree. In addition, prior to analysis a variation filter was applied to remove any 
5 probe set that showed minimal variation across die dataset, and thus contributed 
minimally, if at all, to the discrimination of leukemia subtypes. Specifically, probe 
sets were eliminated from further analysis if the number of cases with a present call 
was less than Vz the number of samples comprising the leukemia subgroup under 
analysis, had a signal value < 100 in all samples in the dataset, or had a maximal 
10 signal value in the dataset - minimal signal value in the dataset that was less than 100. 
In addition, all signal values with absent or marginal calls were reset to 1, while probe 
sets with a present "P" call and a signal <100 had the signal reset to 100. The values 
for signals from the Affymetrix® control sets were removed prior to analysis. 

Unsupervised hierarchical clustering and principal component analysis (PCA) 
15 were performed using GeneMaths software (version 1.5, Applied Maths, Belgium). 
Data reduction to define the genes most useful in class distinction was primarily 
perfonned using a chi-square metric. In this procedure, an entropy-based 
discretization method was first applied to identify genes whose expression across the 
dataset showed differentiation between class and non-class. 17 The assigned 
20 descretized value for the gene was then used in a chi-square calculation to determine 
if the association with a class was more than would be expected by random chance. 
The stronger the association with the class, the larger the chi-square value calculated. 
For the genes that couldn't be discretized, their chi-squared values were set to zero. 
To evaluate the statistical significance of the discriminating genes, we used a 
25 permutation test in which for each class, case labels were randomly reassigned to 

generate new groups of identical size. The label permutated data was discretized again 
and the chi-square values were recalculated. The permutation test was repeated for a 
total of 1000 times. The true chi-square values for each probe set were then compared 
to the values generated from the 1000 permutations to determine how many times a 
30 chi-square value for a probe set in a randomly labeled group was greater than that 
obtained for the true class distinction. A p value was calculated as the number of 
times the chi-square value exceeded the true value in the 1000 permutations. 

The discriminating genes selected were then used in supervised learning 
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algorithms to build classifiers that could identify the specific genetic subgroup. 
Algorithms used included k-Nearest Neighbors (k-NN), Support Vector Machine 
(SVM), and an artificial neural network (ANN). See, Example 1, Witten and Frank 
(1999) Data mining: Practical machine learning tools and techniques with Java 

5 implementation. Morgan Kaufinan; Piatt (1 998) Fast training of support vector 
machines using sequential minimal optimization in Advances in kernel methods - 
support vector learning Schlkopf B, Burges C, and Smola A, eds. MIT Press; and 
Cover and Hart (1967) IEEE Transactions on Information TJieoiy 13:21-27. 
Performance of each model was initially assessed by three-fold cross validation on a 

1 0 randomly selected stratified training set. True error rates of the best performing 

classifiers were then determined using the remaining one-fourth of the samples as a 
blinded test group. Class assignment required that a sample's calculated node value 
exceed a statistically determined confidence level in order for it to be assigned to a 
class. Details of the supervised learning algorithms and their use are described below. 

15 

Detailed Experimental Procedures 

A. Patient Dataset 

132 cases of pediatric ALL were selected from the original 327 diagnostic 
20 bone marrow aspirates described in Example 1 to reanalyze on the higher density 
U133A and B microarrays. The selection of cases was based on having sufficient 
numbers of each subtype to build accurate class predictions, rather than reflecting the 
actual frequency of these groups in the pediatric population. 

25 B. Hybridization of microarrays 

The hybridization solutions according to Example 1 were thawed at 45°C, then 
microcentiifuged for 5 minutes to remove any insoluble material from the mixture. 
The hybridization solutions were added to U133A chips and allowed to hybridize for 
16 hours at 45°C. At the end of the incubation period, the hybridization solution was 
30 removed from each Ul 33 A chip and refrozen. Subsequently, the hybridizations were 
thawed and hybridized to the U133B chip. 

A non-stringent wash buffer (6X SSPE, 0.01% Tween 20) was added to each 
chip cassette after the hybridization solution was removed and the cassette allowed to 

-175- 



BNSUUUIU! <WU. 



WO 03/083140 



PCT/US03/08486 



equilibrate to room temperature. The microarray cassettes were then placed on the 
fluidics station and the antibody amplification protocol performed. The arrays were 
washed at 25°C with the non-stringent buffer followed by a more stringent wash at 
50°C with 100 mM MES, 0.1M NaCl 2 , 0.01% Tween 20. The arrays were then 
5 stained with Streptavidin Phycoerythrin (SAPE, Molecular Probes, Eugene, OR) for 
10 minutes at 25°C Following another non-stringent wash, the arrays were 
hybridized for 10 minutes at 25°C with an antibody solution (100 mM MES, 1 M 
[Na + ], 0.05% Tween 20, 2 mg/ml BSA, 0.1 mg/ml goat IgG, and 3 Dg/ml biotinylated 
antibody). This solution was removed and the cassettes restained with the SAPE 
10 solution. 

Arrays were scanned on a laser confocal scanner (Agilent, Palo Alto, CA) and 
then analyzed with Affymetrix® Microarray Suite 5.0 (MAS 5.0). Detection values 
(present, marginal or absent) were determined by default parameters, and signal 
values were scaled by global methods to a target value of 500. After completing the 
15 scans, the arrays were visually inspected for defects and Affymetrix internal controls 
were utilized to monitor the success of hybridization, washing, and staining 
procedures. 

C. Statistical methods 

20 The chi-square metric and the kNN and ANN supervised learning algorithms 

were performed as described for Example L The SVM supervised learning algorithm 
that was used in this study is available as part of the software package Rv 1.6.0. See, 
Ribeiro, and Brown. Tire ISBA Bulletin, 8(1):12-16, and www.r-project.org. 

To determine the performance of each model using ANN, a confidence 

25 threshold was built for each diagnostic subtype utilizing a modification of the method 
described by Khan et al. (2001) Nat Med, 7:673-679. Models were built based on a 
decision tree fomiat where each level of the decision tree contains only two possible 
distinctions - class and non-class (for example, T verses non-T). At each level, using 
only samples in the training set, 3 ANN models were built by 3 -fold cross validation. 

30 The training set samples were then shuffled and 3 additional ANN models were built. 
This model building process was repeated for a total of 100 times at each step of the 
decision tree. Then an empirical probability distribution for the ANN output node 
value was built only for subtype under study, for example, T-ALL at the first step of 
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the decision tree. Only nodal values greater than 0.5 for each subtype were included. 
For each individual sample in the training set, the 100 validation subtype node values 
were averaged and compared to threshold. Individual samples were assigned to the 
subtype under study only when its average subtype nodal value was greater than the 

5 95% confidence threshold. For samples in the test set, subtype nodal values are 
averaged from all models generated in the 3-fold cross validation. A sample is 
assigned to the class under study when the average subtype nodal value is greater than 
the 95% confidence level defined on the training set. A sample not assigned to the 
subtype will progress to the next level of the decision tree, where the entire process is 

10 repeate 



All publications and patent applications mentioned in the specification are 
indicative of the level of those skilled in the art to which this invention pertains. All 
1 5 publications and patent applications are herein incorporated by reference to the same 
extent as if each individual publication or patent application was specifically and 
individually indicated to be incorporated by reference. 

Although the foregoing invention has been described in some detail by way of 
20 illustration and example for purposes of clarity of understanding, it will be obvious 
that certain changes and modifications may be practiced within the scope of the 
appended claims. 
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THAT WHICH IS CLAIMED: 

1 . A method of assigning a subject affected by leukemia to a leukemia 

risk group, said method comprising: 
5 a) providing a subject expression profile of a sample from said 

subject affected by leukemia; 

b) providing a plurality of reference expression profiles, each 
associated with a leukemia risk group selected from the group consisting of T-ALL, 
E2A-PBX1, TEL-AML1, BCR-ABL, MIX, Hyperdiploid >50, and Novel, wherein 

10 the subject expression profile and each reference expression profile comprise one or 
more values representing the expression level of a gene having differential expression 
in at least one leukemia risk group; and 

c) selecting the reference expression profile most similar to the 
subject expression profile to thereby assign said subject affected by leukemia to a 

1 5 leukemia risk group. 

2. The method of claim 1 wherein the subject expression profile and the 
reference expression profile associated with the T-ALL risk group comprise values 
selected from the group consisting of: 

20 a) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 7; 

b) a value representing the expression level of the gene shown in 

Table 14; 

c) values representing the expression levels of at least 20 genes 

25 selected from the genes shown in Table 2 1 ; 

d) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 28; 

e) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 35; 
30 f) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 59; and 

g) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 67. 
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3 . The method of claim 1 wherein the subj ect expression profile and the 
reference expression profile associated with the E2A-PBX1 risk group comprise 
values selected from the group consisting of: 
5 a) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 3; 

b) a value representing the expression level of the gene shown in 

Table 10; 

c) values representing the expression levels of at least 20 genes 

.0 selected from the genes shown in Table 17; 

d) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 24; 

e) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 31; 
l5 f) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 55; 

g) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 64; and 

h) values representing the expression levels of at least one of the 

20 genes shown in Table 7 1 . 

4. The method of claim 1 wherein the subject expression profile and the 

reference expression profile associated with the TEL-AML1 risk group comprise 

values selected from the group consisting of: 
25 a ) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 8; 

b) values representing the expression levels of the genes shown in 

Table 15; 

c) values representing the expression levels of at least 20 genes 

30 selected from the genes shown in Table 22; 

d) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 29; 
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e) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 36; 

f) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 55 ; 

5 g) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 68; and 

h) values representing the expression levels of at least one of the 
genes shown in Table 74. 

10 5 . The method of claim 1 wherein the subject expression profile and the 

reference expression profile associated with the BCR-ABL risk group comprise 

values selected from the group consisting of: 

a) values representing the expression level of at least 20 genes 

selected from the genes shown in Table 2; 
15 b) values representing the expression levels of the genes shown in 

Table 9; 

c) values representing the expression level of at least 20 genes 
selected from the genes shown in Table 16; 

d) values representing the expression levels of at least 20 genes 
20 selected from the genes shown in Table 23; 

e) values representing the expression levels of at least 20 gene 

selected from the genes shown in Table 30; 

f) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 54; 

25 g) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 63; and 

h) values representing the expression levels of at least one of the 
genes shown in Table 70. 

30 6. The method of claim 1 wherein the subject expression profile and the 

reference expression profile associated with the MLL risk group comprise values 
selected from the group consisting of: 
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a) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 5; 

b) values representing the expression levels of the genes shown in 

Table 12; 

5 c) values representing the expression level of at least 20 genes 

selected from the genes shown in Table 19; 

d) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 26; 

e) values representing the expression levels of at least 20 genes 
10 selected from the genes shown in Table 33 ; 

f) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 57; 

g) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 66; and 

15 h) values representing the expression levels of at least one of the 

genes shown in Table 73. 

7. The method of claim 1 wherein the subject expression profile and the 
reference expression profile associated with the Hyperdiploid >50 risk group 
20 comprise values selected from the group consisting of: 

a) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 4; 

b) values representing the expression levels of the genes shown in 

Table 11; 

25 c) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 18; 

d) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 25; 

e) values representing the expression levels of at least 20 genes 
30 selected from the genes shown in Table 32; 

f) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 56; 
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g) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 65; and 

h) values representing the expression levels of at least one of the 

genes shown in Table 72. 

5 

8. The method of claim 1 wherein the subject expression profile and the 
reference expression profile associated with the Novel risk group comprise values 
selected from the group consisting of: 

a) values representing the expression level of at least 20 genes 

10 selected from the genes shown in Table 6; 

b) values representing the expression level of the genes shown in 

Table 13; 

c) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 20; 

15 d) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 27; 

e) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 34; and 

f) values representing the expression levels of at least 20 genes 
20 selected from the genes shown in Table 58. 

9. The method of claim 1 , wherein said sample from said subject affected 
by ALL comprises leukemic blasts. 

25 10. The method of claim 9, wherein said sample from said subject affected 

by ALL comprises at least 35 % leukemic blasts. 

1 1 . The method of claim 10, wherein said sample from said subject 
affected by ALL comprises at least 75% leukemic blasts. 



30 



12. The method of claim 9 wherein said sample comprises leukemic blasts 
derived from peripheral blood. 
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1 3 . The method of claim 9 wherein said sample comprises blast cells 
derived from bone marrow. 



14. A method of predicting whether a subject affected by leukemia has an 
5 increased risk of relapse, said method comprising the steps of: 

a) assigning the subject affected by leukemia to a leukemia risk 
group selected from the group consisting of T-ALL, Hyperdiploid >50, TEL-AML1, 
MLL, E2A-PBX1, BCR-ABL, and Novel; 

b) providing a subject expression profile of a sample from said 

1 0 subj ect affected by leukemia; 

c) providing a reference expression profile associated with the 
occurrence of relapse in the leukemia risk group to which the subject affected by 
leukemia is assigned., wherein the subject expression profile and the reference 
expression profile comprise one or more values representing the expression level of a 

1 5 gene having differential expression in subjects affected by leukemia who will relapse 
after conventional therapy; and 

d) determining whether the subject expression profile shares 
sufficient similarity to the reference expression profile associated with relapse in the 
leukemia risk group to which the subject affected by leukemia is assigned to thereby 

20 determine whether the subj ect affected by leukemia has an increased risk of relapse. 

15. The method of claim 14, wherein the step of assigning the subject 
affected by leukemia to a leukemia risk group is performed according to the method 
of claim 1. 

25 

1 6. The method of claim 14, wherein said subj ect affected by leukemia is 
assigned to the T-ALL risk group and said subject expression profile and said 
reference expression profile comprise values representing the expression levels of at 
least 8 genes selected from the genes shown in Table 44. 

30 

1 7. The method of claim 14, wherein said subject affected by leukemia is 
assigned to the Hyperdiploid >50 risk group and said subject expression profile and 
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said reference expression profile comprise values representing the expression levels of 
at least 5 genes selected from the genes shown in Table 45. 

18. The method of claim 14, wherein said subject affected by leukemia is 
5 assigned to the TEL-AML1 risk group and said subj ect expression profile and said 

reference expression profile comprise values representing the expression levels of at 
least 3 genes selected from the genes shown in Table 46. 

1 9. The method of claim 14, wherein said subject affected by leukemia is 
1 0 assigned to the MLL risk group and said subject expression profile and said reference 

expression profile comprise values representing the expression levels of at least 5 
genes selected from the genes shown in Table 47. 

20. The method of claim 14, wherein said subject affected by leukemia is 
1 5 not assigned to the T-ALL, Hyperdiploid>50, TEL-AML1 , MLL, E2 A-PBX1 , or 

BCR-ABL risk group and said subject expression profile and said reference 
expression profile comprise values representing the expression levels of at least 4 
genes selected from the genes shown in Table 48. 

20 21 . A method of predicting whether a subject affected by TEL-AML1 has 

an increased risk of developing secondary AML, said method comprising: 

a) providing a subject expression profile of a sample from said 

subject affected by TEL-AML1; 

b) providing a reference expression profile associated with the 
25 occurrence of secondary AML in subjects affected by TEL- AML 1 wherein the 

subject expression profile and the reference expression profile comprise one or more 
values representing the expression level of a gene having differential expression in 
subjects affected by TEL-AML1 who will develop secondary AML; and 

c) determining whether the subj ect expression profile shares 

30 sufficient similarity to the reference expression profile associated with the occurrence 
of secondary AML to thereby determine whether the subject affected by TEL- AML 1 
has an increased risk of developing secondary AML. 
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22 . A method of choosing a therapy for a subj ect affected by leukemia, 
said method comprising: 

a) providing a subject expression profile of a sample from said 

subject affected by leukemia; 
5 b) providing a plurality of reference expression profiles, each 

associated with a leukemia risk group selected from the group consisting of T- ALL, 
E2A-PBX1, TEL-AML1, BCR-ABL, MLL, Hyperdiploid >50, and Novel, wherein 
the subject expression profile and each reference expression profile comprise one or 
more values representing the expression of level of a gene having differential 
10 expression in at least one leukemia risk group; and 

c) selecting the reference expression profile most similar to the 
subject expression profile to thereby choose a therapy for the subject affected by 
leukemia. 

15 23 . A method of choosing a therapy for a subj ect affected by leukemia, 

said method comprising the steps of: 

a) assigning the subject affected by leukemia to a leukemia risk 

group selected from the group consisting of T- ALL, Hyperdiploid >50, TEL-AML1, 

MLL, E2A-PBX1 , BCR-ABL, and Novel; 
20 b) providing a subject expression profile of a sample from said 

subject affected by ALL; 

c) providing a reference expression profile associated with the 
occurrence of relapse in the leukemia risk group to which the subject affected by 

leukemia is assigned, wherein the subject expression profile and the reference 
25 expression profile comprise one or more values representing the expression level of a 
gene having differential expression in subjects who will relapse after conventional 
therapy; and 

d) determining whether the subject expression profile shares 
sufficient similarity to the reference expression profile associated with relapse in the 

30 leukemia risk group to which the subject affected by ALL is assigned to thereby chose 
a therapy for said subject affected by ALL. 
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24. The method of claim 23, wherein the step of assigning the subject 
affected by leukemia to a leukemia risk group is performed according to the method 
of claim 1. 

5 25. The method of claim 23, wherein said subject affected by leukemia is 

assigned to the T-ALL risk group and said subject expression profile and said 
reference expression profile comprise values representing the expression levels of at 
least 8 genes selected from the genes shown in Table 44. 

1 0 26. The method of claim 23, wherein said subject affected by leukemia is 

assigned to the Hyperdiploid >50 risk group and said subject expression profile and 
said reference expression profile comprise values representing the expression levels of 
at least 5 genes selected from the genes shown in Table 45. 

1 5 27. The method of claim 23, wherein said subject affected by leukemia is 

assigned to the TEL-AML1 risk group and said subject expression profile and said 
reference expression profile comprise values representing the expression levels of at 
least 3 genes selected from the genes shown in Table 46. 

20 28. The method of claim 23, wherein said subject affected by leukemia is 

assigned to the MLL risk group and said subject expression profile and said reference 
expression profile comprise values representing the expression levels of at least 5 
genes selected from the genes shown in Table 47. 



25 29. The method of claim 23, wherein said subject affected by leukemia is 

not assigned to the T-ALL, hyperdiploid >50, TEL-AML1, MLL, E2A-PBX1, or 
BCR-ABL risk group and said subject expression profile and said reference 
expression profile comprise values representing the expression levels of at least 4 
genes selected from the genes shown in Table 48. 

30 

30. A method of choosing a therapy for a subject affected by TEL-AML1, 
said method comprising: 
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a) providing a subject expression profile of a sample from said 
subject affected by TEL-AML1; 

b) providing a reference expression profile associated with the 
occurrence of secondary AML in subjects affected by TEL-AML1 wherein the 

5 subject expression profile and the reference expression profile comprise one or more 
values representing the expression level of a gene having differential expression in 
subjects affected by TEL-AML1 who will develop secondary AML; and 

c) determining, whether the subject expression profile shares 
sufficient similarity to the reference expression profile associated with the occurrence 

10 of secondary AML to thereby chose a therapy for the subject affected by TEL- AML 1 . 

3 1 . The method of claim 30, wherein said subject expression profile and 
said reference expression profile comprise values representing the expression levels of 
at least 7 genes selected from the genes shown in Table 48. 

15 

32. A method to aid in the detemiination of a prognosis for a subject 
affected by leukemia, said method comprising: 

a) providing a subject expression profile of a sample from said 
subject affected by leukemia; 

20 b) providing a plurality of reference expression profiles, each 

associated with a leukemia risk group selected from the group, consisting of T- ALL, 
E2A-PBX1, TEL-AML1, BCR-ABL, MLL, Hyperdiploid >50, and Novel, wherein 
the subject expression profile and each reference expression profile comprise one or 
more values representing the expression of level of a gene having differential 

25 expression in at least one leukemia risk group; and 

c) selecting the reference expression profile most similar to the 
subject expression profile to thereby determine the prognosis for the subject affected 
by leukemia. 

30 33. A method to aid in the determination of the prognosis for a subject 

affected by leukemia, said method comprising the steps of: 
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a) assigning the subject affected by leukemia to a leukemia risk 
group selected from the group consisting of T- ALL, Hyperdiploid >50, TEL-AML1, 
MLL, E2A-PBX1, BCR-ABL, or Novel risk group; 

b) providing a subject expression profile of a sample from said 
5 subject affected by leukemia; 

c) providing a reference expression profile associated with the 
occurrence of relapse in the leukemia risk group to which the subject affected by 
leukemia is assigned, wherein the subject expression profile and the reference 
expression profile comprise one or more values representing the expression level of a 

10 gene having differential expression in subjects who will relapse after conventional 
therapy ; and 

d) determining whether the subject expression profile shares 
sufficient similarity to the reference expression profile associated with relapse in the 
Leukemia risk group to which the subject affected by leukemia is assigned to thereby 

1 5 determine the prognosis for the subject affected by leukemia. 

34. A method to aid in the determination of the prognosis for a subject 
affected by TEL-AML1, said method comprising: 

a) providing a subject expression profile of a sample from said 
20 subject affected by TEL-AML1 ; 

b) providing a reference expression profile associated with the 
occurrence of secondary AML in subjects affected by TEL- AML 1 wherein the 
subject expression profile and the reference expression profile comprise one or more 
values representing the expression level of a gene having differential expression in 

25 subjects affected by TEL-AML1 who will develop secondary AML after conventional 
therapy; and 

c) determining whether the subject expression profile shares 
sufficient similarity to the reference expression profile associated with the occurrence 
of secondary AML to thereby determine the prognosis for the subject affected by 

30 TEL-AML1. 
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35. A method of assigning a subject affected by ALL to an ALL risk group 
selected from the group consisting of T-ALL, E2A-PBX1, TEL-AML1, BCR-ABL, 
MLL, Hyperdiploid >50, and Novel, said method comprising: 

a) providing a subject expression profile of a sample from said 
5 affected by ALL; 

b) providing a reference expression profile associated with the T- 
ALL risk group wherein the subject expression profile and the reference expression 
profile comprises one or more values representing the expression level of a gene 
having differential expression in the T-ALL risk group; 

10 c) determining whether the subject expression profile shares 

statistically significant similarity to the reference expression profile associated with 
the T-ALL risk group to thereby determine whether the subject affected by ALL is in 
the T-ALL risk group; 

15 d) if the subject affected by ALL is not in the T-ALL risk group, 

providing a reference expression profile associated with the E2A-PBX1 risk group 
wherein the subject expression profile and the reference expression profile comprises 
one or more values representing the expression level of a gene having differential 
expression in the E2A-PBX1 risk group; 

20 e) determining whether the subject expression profile shares 

statistically significant similarity to the reference expression profile associated with 
the E2A-PBX1 risk group to thereby determine whether the subject affected by ALL 
is in the E2A-PBX1 risk group; 

f) if the subject affected by ALL is not in the E2A-PBX risk 
25 group, providing a reference expression profile associated with the TEL-AML1 risk 

group wherein the subject expression profile and each reference expression profile 
comprises one ore more valued representing the expression level of a gene having 
differential expression in the TEL-AML1 risk group; 

g) determining whether the subject expression profile shares 
30 statistically significant similarity to the reference expression profile associated with 

the TEL-AML1 risk group to thereby determine whether the subject affected by ALL 
is in the TEL-AML1 risk group; 
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h) if the subject affected by ALL is not in the Tel-AMLl risk 
group, providing a reference expression profile associated with the BCR-ABL risk 
group wherein the subject expression profile and each reference expression profile 
comprises one or more values representing the expression level of a gene having 

5 differential expression in the BCR-ABL risk group; 

i) determining whether the subject expression profile shares 
statistically significant similarity to the reference expression profile associated with 
the BCR-ABL risk group to thereby determine whether the subject affected by ALL is 
in the BCR-ABL risk group; 

10 j) if the subject affected by ALL is not in the BCR-ABL risk 

group, providing a reference expression profile associated with the MLL risk group 
wherein the subject expression profile and each reference expression profile 
comprises one or more values representing the expression level of a gene having 
differential expression in the MLL risk group; 

15 k) determining whether the subject expression profile shares 

statistically significant similarity to the reference expression profile associated with 
the MLL risk group to thereby determine whether the subject affected by ALL is in 
the MLL risk group; 

1) if the subject affected by ALL is not in the MLL risk group, 

20 providing a reference expression profile associated with the Hyperdiploid >50 risk, 
group wherein the subject expression profile and each reference expression profile 
comprises one or more values representing the expression level of a gene having 
differential expression in the Hyperdiploid >50 risk group; 

m) determining whether the subject expression profile shares 

25 statistically significant similarity to the reference expression profile associated with 
the Hyperdiploid 50 risk group to thereby determine whether the subject affected by 
ALL is in the Hyperdiploid >50 risk group; 

n) if the subject affected by ALL is not in the Hyperdiploid >50 
risk group, providing a reference expression profile associated with the Novel risk 

30 group wherein the subject expression profile and each reference expression profile 
comprises one or more values representing the expression level of a gene having 
differential expression in the Novel risk group; and 
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o) determining whether the subject expression profile shares 
statistically significant similarity to the reference expression profile associated with 
the Novel risk group to thereby determine whether the subject affected by ALL is in 
the Novel risk group. 

5 

36. An array for use in a method of assigining a subject affected by 
leukemia to a leukemia risk group comprising a substrate having a plurality of 
addresses, wherein each address has disposed thereon a capture probe that can 
specifically bind a nucleic acid molecule selected from the group consisting of: 

10 a) a nucleic acid molecule that is differentially expressed in at 

least one leukemia risk group selected from the group consisting of T- ALL, E2A- 
PBX1, TEL-AML1, BCR-ABL, MLL, Hyperdiploid >50, and Novel; 

b) a nucleic acid molecule that is differentially expressed in 
subjects affected by leukemia who will relapse after conventional therapy; and 

15 c) a nucleic acid molecule that is differentially expressed in 

subjects affected by leukemia who will develop secondary AML after conventional 
therapy. 

37. The array of claim 36, wherein each nucleic acid molecule that is 

20 differentially expressed in at least one leukemia risk group is selected from the group 
consisting of the genes shown in Tables 2-36, 63-68, and 70-74. 

38. The array of claim 36, wherein each nucleic acid molecule that is 
differentially expressed in subjects affected by leukemia who will relapse after 

25 conventional therapy is selected from the group consisting of the genes shown in 
Tables 44-48. 

39. The array of claim 36, wherein each nucleic acid molecule that is 
differentially expressed in subjects affected by leukemia who will develop secondary 

30 AML after conventional therapy is selected from the group consisting of the genes 
shown in Table 52. 
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40. The array of claim 36, wherein the substrate has greater than 20 
addresses. 



41 . The array of claim 40, wherein the substrate has greater than 40 
5 addresses. 



42. The array of claim 41, wherein the substrate has greater than 68 
addresses. 



10 43. The array of claim 36, wherein the substrate has no more than 500 

addresses. 

44. A kit for assigning a subject affected by ALL to a leukemia risk group, 
said kit comprising: 

15 a) an array comprising a substrate having a plurality of addresses, 

wherein each address has disposed thereon a capture probe that can specifically bind a 
nucleic acid molecule that is differentially expressed in at least one leukemia risk 
group selected from the group consisting of T-ALL, E2A-PBX1, TEL-AML1, BCR- 
ABL, MLL, Hyperdiploid >50, and Novel; and 

20 b) a computer-readable medium having a plurality of digitally^ 

encoded expression profiles wherein each profile of the plurality has a plurality of 
values, each value representing the expression of a nucleic acid molecule detected by 
the array. 



25 45. A kit for assigning a subject affected by ALL to a leukemia risk group, 

said kit comprising: 

a) an array according to claim 37; and 

b) a computer-readable medium having a plurality of digitally- 
encoded expression profiles wherein each profile of the plurality has a plurality of 

30 values, each value representing the expression of a nucleic acid molecule detected by 
the array. 
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46. A kit for predicting whether a subject affected by leukemia has an 
increased risk of relapse, said kit comprising: 

a) an array comprising a substrate having a plurality of addresses, 
wherein each address has disposed thereon a capture probe that can specifically bind a 

5 nucleic acid molecule that is differentially expressed in subjects affected by leukemia 
who will relapse following conventional therapy; and 

b) a computer-readable medium having a plurality of digitally- 
encoded expression profiles wherein each profile of the plurality has a plurality of 
values, each value representing the expression of a nucleic acid molecule detected by 

10 the array. 

47. A kit for predicting whether a subject affected by leukemia has an 
increased risk of relapse, said kit comprising: 

a) an array accrding to claim 38; and 
15 b) a computer-readable medium having a plurality of digitally- 

encoded expression profiles wherein each profile of the plurality has a 
plurality of values, each value representing the expression of a nucleic acid 
molecule detected by the array. 

20 48. A kit for predicting whether a subject affected by TEL-AML1 has an 

increased risk of relapse, said kit comprising: 

a) an array comprising a substrate having a plurality of addresses, 
wherein each address has disposed thereon a capture probe that can specifically bind a 
nucleic acid molecule that is differentially expressed in subjects affected by TEL- 

25 AML1 who will relapse after conventional therapy; and 

b) a computer-readable medium having a plurality of digitally- 
encoded expression profiles wherein each profile of the plurality has a plurality of 
values, each value representing the expression of a nucleic acid molecule detected by 
the array. 



30 



49. A kit for predicting whether a subject affected by TEL-AML1 has an 
increased risk of relapse, said kit comprising: 

a) an array according to claim 39; and 
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b) a computer-readable medium having a plurality of digitally- 
encoded expression profiles wherein each profile of the plurality has a 
plurality of values, each value representing the expression of a nucleic acid 
molecule detected by the array. 

5 

50. A kit to aid in choosing therapy for a subject affected by leukemia, said 
kit comprising: 

a) an array comprising a substrate having a plurality of addresses, 
wherein each address has disposed thereon a capture probe that can specifically bind a 

10 nucleic acid molecule that is differentially expressed in at least one leukemia risk 

group selected from the group consisting of T- ALL, E2A-PBX1, TEL-AML1, BCR- 
ABL, MLL, Hyperdiploid >50, and Novel; and 

b) a computer-readable medium having a plurality of digitally- 
encoded expression profiles wherein each profile of the plurality has a plurality of 

15 values, each value representing the expression of a nucleic acid molecule detected by 
the array. 

51. A kit to aid in choosing therapy for a subject affected by leukemia, said 
kit comprising: 

20 a) an array according to claim 37; and 

b) a computer-readable medium having a plurality of digitally- 
encoded expression profiles wherein each profile of the plurality has a 
plurality of values, each value representing the expression of a nucleic acid 
molecule detected by the array. 

25 

52. A computer-readable medium comprising a plurality of digitally- 
encoded expression profiles wherein each profile of the plurality has a plurality of 
values, each value representing the expression of a gene that is differentially 
expressed in at least one leukemia risk group selected from the group consisting of T- 

30 ALL, E2A-PBX1 , TEL-AML1 , BCR-ABL, MLL, Hyperdiploid >50, and Novel. 

53. The computer readable medium of claim 52, wherein the expression 
profiles comprise values selected from the group consisting of: 
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a) values representing the expression levels of at least 7 genes 
selected from the genes show in Tables 2-8, 16-36, 54-60, and 63-68; 

b) a value representing the expression level of the gene shown in 

Table 10; 

5 c) a value representing the expression level of the gene shown in 

Table 14; 

d) values representing the expression levels of the genes shown in 
Tables 9, 11, 12, 13, and 15; and 

e) values representing the expression level of at least one gene 
10 showin in Tables 70, 71, 72, 73, and 74. 

54. A computer-readable medium comprising a plurality of digitally- 
encoded expression profiles wherein each profile of the plurality has a plurality of 
values, each value representing the expression of a gene that is differentially 

15 expressed in subjects affected by leukemia who will relapse following conventional 
therapy. 

55. The computer readable medium of claim 54, wherein the expression 
profiles comprise values selected from the group consisting of: 

20 a) values representing the expression levels at least 8 genes 

selected from the genes show in Table 44. 

b) values representing the expression levels of at least 5 genes 
selected from the genes shown in Table 45; 

c) values representing the expression levels of at least 3 genes 
25 selected from the genes shown in Table 46; 

d) values representing the expression levels of at least 5 genes 
selected from the genes shown in Table 47; and 

e) values representing the expression levels of at least 4 genes 
selected from the genes shown in Table 48. 

30 

56. A computer-readable medium comprising a plurality of digitally- 
encoded expression profiles wherein each profile of the plurality has a plurality of 
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values, each value representing the expression of a gene that is differentially 
expressed in subjects affected by leukemia who will develop secondary AML. 

57. The computer readable medium of claim 56, wherein the expression 

5 profiles comprise values selected from values representing the expression levels of at 
least 7 genes selected from the genes show in Table 52. 

58. The method of claim 1 wherein the subject expression profile and the 
reference expression profile associated with the T-ALL risk group comprise values 

10 selected from the group consisting of: 

a) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 7; 

b) a value representing the expression level of the gene shown in 

Table 14; 

15 c) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 21; 

d) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 28; 

e) values representing the expression levels of at least 20 genes 
20 selected from the genes shown in Table 35; and 

f) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 59. 

59. The method of claim 1 wherein the subject expression profile and the 
25 reference expression profile associated with the E2A-PBX1 risk group comprise 

values selected from the group consisting of: 

a) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 3; 

b) a value representing the expression level of the gene shown in 

30 Table 10; 

c) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 17; 
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d) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 24; 

e) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 31; 

5 f) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 55; 

g) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 64; and 

h) values representing the expression levels of at least one of the 
1 0 genes shown in Table 7 1 . 

60. The method of claim 1 wherein the subject expression profile and the 
reference expression profile associated with the TEL-AML1 risk group comprise 
values selected from the group consisting of: 

15 a) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 8; 

b) values representing the expression levels of the genes shown in 

Table 15; 

c) values representing the expression levels of at least 20 genes 
20 selected from the genes shown in Table 22; 

d) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 29; 

e) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 36; and 

25 f) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 55. 

61 . The method of claim 1 wherein the subject expression profile and the 
reference expression profile associated with the BCR-ABL risk group comprise 

30 values selected from the group consisting of: 

a) values representing the expression level of at least 20 genes 
selected from the genes shown in Table 2; 
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b) values representing the expression levels of the genes shown in 

Table 9; 

c) values representing the expression level of at least 20 genes 
selected from the genes shown in Table 1 6; 

5 d) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 23; 

e) values representing the expression levels of at least 20 gene 
selected from the genes shown in Table 30; and 

f) values representing the expression levels of at least 20 genes 
10 selected from the genes shown in Table 54. 

62. The method of claim 1 wherein the subject expression profile and the 
reference expression profile associated with the MIX risk group comprise values 
selected from the group consisting of: 

15 a) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 5; 

b) values representing the expression levels of the genes shown in 

Table 12; 

c) values representing the expression level of at least 20 genes 
20 selected from the genes shown in Table 19; 

d) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 26; 

e) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 33; and 

25 f) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 57. 

63. The method of claim 1 wherein the subject expression profile and the 
reference expression profile associated with the Hyperdiploid >50 risk group 

30 comprise values selected from the group consisting of: 

a) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 4; 
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b) values representing the expression levels of the genes shown in 

Table 11; 

c) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 18; 

5 d) values representing the expression levels of at least 20 genes 

selected from the genes shown in Table 25; 

e) values representing the expression levels of at least 20 genes 
selected from the genes shown in Table 32; and 

f) values representing the expression levels of at least 20 genes 
10 selected from the genes shown in Table 56. 

64. The array of claim 36, wherein each nucleic acid molecule that is 
differentially expressed in at least one leukemia risk group is selected from the group 
consisting of the genes shown in Tables 2-36. 

15 
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readable media which is not patentable subject matter. 

This application contains the following inventions or groups of inventions which are not so linked as to form a single general 
inventive concept under PCT Rule 13.1. In order for all inventions to be examined, the appropriate additional examination fees must 
be paid. 

Group I, claim(s) 1,9-13, 36, 40-44, 46, 48, and 50 drawn to a method of assigning a leukemia patient expression profile to a risk 
group and apparatus for performing the method ( i u method and l tt apparatus). 

Group II, claim(s) 14, drawn to a method of detennining prognosis of leukemia relapse (2™* method). 

Group III, claim(s) 21, drawn to a method of determining prognosis of secondary AML in a subject affected by TEL- AML 1 (3 rd 
method). 

Group IV, clatm(s) 22, drawn to a method of choosing a therapy for a subject affected by leukemia by comparing expression profiles 
of the subject to expression profiles of subjects in different risk groups (4 th method). 

Group V, claim(s) 23, drawn to a method of choosing a therapy for a subject affected by leukemia by comparing expression profiles 
of the subject to expression profiles of subjects who will relapse (5 th method) . 

Group VI, claim(s) 30, drawn to a method of choosing a therapy for a subject affected by TEL- AML 1 by comparing expression 
profiles of the subject to expression profiles of subjects who will develop secondary AML (6 th method). 

Group VII, claim(s) 32, drawn to a method of detennining the prognosis of a subject affected by leukemia by comparing expression 
profiles of the subject to expression profiles of subjects in different risk groups (7 th method). 

Group VIII, claim(s) 33, drawn to a method of determining the prognosis of a subject affected by leukemia by assigning the subject to 
a risk group and then comparing expression profiles of the subject to expression profiles of subjects in the same risk group who have 
relapsed (8 m method). 

Group IX, claim(s) 34, drawn to a method of determining the prognosis of a TEL-AML1 subject by comparing expression profiles of 
the subject to expression profiles of subjects affected by TEL- AML 1 (9 th method). 

Group X, claim(s) 35, drawn to a method of assigning a subject affected by ALL to an ALL risk group by comparing expression 
profiles of the subject to expression profiles of the subject to expression profiles to subjects in different risk groups (lCr* method). 

This application contains claims directed to more than one species of the generic invention. These species are deemed to lack unity of 
invention because they are not so linked as to form a single general inventive concept under PCT Rule 13.1. 

In order for more than one species to be examined, the appropriate additional examination fees must be paid. The species are as 
follows: 
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The seven risk group species are 1)T-ALL, 2) E2A-PBX1 , 3) TEL- Ami ! . 4) BCR-ABL, 5) MIX, 6) Hyperdiploid> 50, and 7) 
Novel. 

The claims are deemed to correspond to the species listed above in the following manner: 

Claims 1 , 4U-43. and 50 of group I and claims 14, 21 . 22 , 23 , 30, 32 , 33, 34, and 35 of Groups II-X are Markush-type claims. 
Claims 9-13 of Group I are drawn to the ALL species. Claim 48 of Group I is drawn to the TEL-AML1 species. 

The following claim(s) are generic: 44 and 46 of Group I. 

The inventions listed as Groups I-X do not relate to a single general inventive concept under PCT Rule 13.1 because, under PCT Rule 
13.2, they lack the same or corresponding special technical features for the following reasons: PCT Rule 13. 1 and Annex B do not 
provide for unity of invention between two or more different products, methods of making, methods of use, or apparatus that share a 
special technical feature. Each Group is drawn to a different method with different steps and produces different results. 

The species listed above do not relate to a single general inventive concept under PCT Rule 13. 1 because, under PCT Rule 13.2, the 
species lack the same or corresponding special technical features for the following reasons: each species is drawn to a mutually 
exclusive different disease risk group. 
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